Linear trend estimation

Linear trend estimation is a statistical technique used to analyze data patterns.

Data patterns, or trends, occur when the information gathered tends to increase or decrease over time or is influenced by changes in an external factor.

The simplest function is a straight line with the dependent variable (typically the measured data) on the vertical axis and the independent variable (often time) on the horizontal axis.

This method minimizes the sum of the squared errors in the data series

are chosen to minimize the sum of squared errors This formula first calculates the difference between the observed data

To analyze a (time) series of data, it can be assumed that it may be represented as trend plus noise: where

Commonly, where only a single time series exists to be analyzed, the variance of the

The use of a linear trend line has been the subject of criticism, leading to a search for alternative approaches to avoid its use in model estimation.

One of the alternative approaches involves unit root tests and the cointegration technique in econometric studies.

The estimated coefficient associated with a linear trend variable such as time is interpreted as a measure of the impact of a number of unknown or known but immeasurable factors on the dependent variable over one unit of time.

Strictly speaking, this interpretation is applicable for the estimation time frame only.

Outside of this time frame, it cannot be determined how these immeasurable factors behave both qualitatively and quantitatively.

Research results by mathematicians, statisticians, econometricians, and economists have been published in response to those questions.

For example, detailed notes on the meaning of linear time trends in the regression model are given in Cameron (2005);[1] Granger, Engle, and many other econometricians have written on stationarity, unit root testing, co-integration, and related issues (a summary of some of the works in this area can be found in an information paper[2] by the Royal Swedish Academy of Sciences (2003)); and Ho-Trieu & Tucker (1990) have written on logarithmic time trends with results indicating linear time trends are special cases of cycles.

Consider a concrete example, such as the global surface temperature record of the past 140 years as presented by the IPCC.

However, as noted elsewhere,[4] this time series doesn't conform to the assumptions necessary for least-squares to be valid.

It says what fraction of the variance of the data is explained by the fitted trend line.

Often, filtering a series increases r2 while making little difference to the fitted trend.

Thus far, the data have been assumed to consist of the trend plus noise, with the noise at each data point being independent and identically distributed random variables with a normal distribution.

This is important, as it makes an enormous difference to the ease with which the statistics can be analyzed so as to extract maximum information from the data series.

If there are other non-linear effects that have a correlation to the independent variable (such as cyclic influences), the use of least-squares estimation of the trend is not valid.

are invalid unless departures from the standard assumptions are properly accounted for, for example, as follows: In R, the linear trend in data can be estimated by using the 'tslm' function of the 'forecast' package.

Medical and biomedical studies often seek to determine a link between sets of data, such as of a clinical or scientific metric in three different diseases.

In these cases, one would expect the effect test statistic (e.g., influence of a statin on levels of cholesterol, an analgesic on the degree of pain, or increasing doses of different strengths of a drug on a measurable index, i.e. a dose - response effect) to change in direct order as the effect develops.

The same principle may be applied to the effects of allele/genotype frequency, where it could be argued that a single-nucleotide polymorphism in nucleotides XX, XY, YY are in fact a trend of no Y's, one Y, and then two Y's.

[3] The mathematics of linear trend estimation is a variant of the standard ANOVA, giving different information, and would be the most appropriate test if the researchers hypothesize a trend effect in their test statistic.

Levels of trypsin (ng/mL) rise in a direct linear trend of 128, 152, 194, 207, 215, 218 (data from Altman).

Incidentally, it could be reasonably argued that as age is a natural continuously variable index, it should not be categorized into decades, and an effect of age and serum trypsin is sought by correlation (assuming the raw data is available).

A further example is of a substance measured at four time points in different groups: This is a clear trend.

However, should the data have been collected at four time points in the same individuals, linear trend estimation would be inappropriate, and a two-way (repeated measures) ANOVA would have been applied.

Illustration of the effect of filtering on r 2 . Black = unfiltered data; red = data averaged every 10 points; blue = data averaged every 100 points. All have the same trend, but more filtering leads to higher r 2 of fitted trend line.