Least-squares spectral analysis

[1][2] Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in the long and gapped records; LSSA mitigates such problems.

In 1963, Freek J. M. Barning of Mathematisch Centrum, Amsterdam, handled unequally spaced data by similar techniques,[12] including both a periodogram analysis equivalent to what nowadays is called the Lomb method and least-squares fitting of selected frequencies of sinusoids determined from such periodograms — and connected by a procedure known today as the matching pursuit with post-back fitting[13] or the orthogonal matching pursuit.

[14] Petr Vaníček, a Canadian geophysicist and geodesist of the University of New Brunswick, proposed in 1969 also the matching-pursuit approach for equally and unequally spaced data, which he called "successive spectral analysis" and the result a "least-squares periodogram".

[4] He generalized this method to account for any systematic components beyond a simple mean, such as a "predicted linear (quadratic, exponential, ...) secular trend of unknown magnitude", and applied it to a variety of samples, in 1971.

[5] Vaníček's strictly least-squares method was then simplified in 1976 by Nicholas R. Lomb of the University of Sydney, who pointed out its close connection to periodogram analysis.

[9] Subsequently, the definition of a periodogram of unequally spaced data was modified and analyzed by Jeffrey D. Scargle of NASA Ames Research Center,[10] who showed that, with minor changes, it becomes identical to Lomb's least-squares formula for fitting individual sinusoid frequencies.

Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out regarding least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent.

In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies using a standard linear regression or least-squares fit.

Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula so to find a time delay

, this method gives the same power as does a least-squares fit to sinusoids of that frequency and of the form: In practice, it is always difficult to judge if a given Lomb peak is significant or not, especially when the nature of the noise is unknown, so for example a false-alarm spectral peak in the Lomb periodogram analysis of noisy periodic signal may result from noise in turbulence data.

Mathematically, FOS uses a slightly modified Cholesky decomposition in a mean-square error reduction (MSER) process, implemented as a sparse matrix inversion.

[15][22] As with the other LSSA methods, FOS avoids the major shortcoming of discrete Fourier analysis, so it can accurately identify embedded periodicities and excel with unequally spaced data.

All intervening grid points receive zero statistical weight, equivalent to having infinite error bars at times between samples.

In addition, it is possible to perform a full simultaneous or in-context least-squares fit by solving a matrix equation and partitioning the total data variance between the specified sinusoid frequencies.

[3] However, as mentioned above, one should keep in mind that Lomb's simplification and diverging from the least squares criterion opened up his technique to grave sources of errors, resulting even in false spectral peaks.