Ensemble Kalman filter

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models.

The EnKF originated as a version of the Kalman filter for large problems (essentially, the covariance matrix is replaced by the sample covariance), and it is now an important data assimilation component of ensemble forecasting.

The ensemble Kalman filter (EnKF) is a Monte Carlo implementation of the Bayesian update problem: given a probability density function (PDF) of the state of the modeled system (the prior, called often the forecast in geosciences) and the data likelihood, Bayes' theorem is used to obtain the PDF after the data likelihood has been taken into account (the posterior, often called the analysis).

The original Kalman filter, introduced in 1960,[1] assumes that all PDFs are Gaussian (the Gaussian assumption) and provides algebraic formulas for the change of the mean and the covariance matrix by the Bayesian update, as well as a formula for advancing the mean and covariance in time provided the system is linear.

However, maintaining the covariance matrix is not feasible computationally for high-dimensional systems.

[2][3] EnKFs represent the distribution of the system state using a collection of state vectors, called an ensemble, and replace the covariance matrix by the sample covariance computed from the ensemble.

-dimensional state vector of a model, and assume that it has Gaussian probability distribution with mean

means proportional; a PDF is always scaled so that its integral over the whole space is one.

, called the prior, was evolved in time by running the model and now is to be updated to account for new data.

, called the data likelihood, is The PDF of the state and the data likelihood are combined to give the new probability density of the system state

The EnKF is a Monte Carlo approximation of the Kalman filter, which avoids evolving the covariance matrix of the PDF of the state vector

Ideally, ensemble members would form a sample from the prior distribution.

The EnKF is now obtained simply by replacing the state covariance

and the inverse is replaced by a pseudoinverse, computed using the singular-value decomposition (SVD) .

Since these formulas are matrix operations with dominant Level 3 operations,[10] they are suitable for efficient implementation using software packages such as LAPACK (on serial and shared memory computers) and ScaLAPACK (on distributed memory computers).

[9] Instead of computing the inverse of a matrix and multiplying by it, it is much better (several times cheaper and also more accurate) to compute the Cholesky decomposition of the matrix and treat the multiplication by the inverse as solution of a linear system with many simultaneous right-hand sides.

[10] Since we have replaced the covariance matrix with ensemble covariance, this leads to a simpler formula where ensemble observations are directly used without explicitly specifying the matrix

is called the observation function or, in the inverse problems context, the forward operator.

The following alternative formula is advantageous when the number of data points

is diagonal (which is the case when the data errors are uncorrelated), or cheap to decompose (such as banded due to limited covariance distance).

Using the Sherman–Morrison–Woodbury formula[12] with gives which requires only the solution of systems with the matrix

The EnKF version described here involves randomization of data.

[13][14][15] Since the ensemble covariance is rank deficient (there are many more state variables, typically millions, than the ensemble members, typically less than a hundred), it has large terms for pairs of points that are spatially distant.

Since in reality the values of physical fields at distant locations are not that much correlated, the covariance matrix is tapered off artificially based on the distance, which gives rise to localized EnKF algorithms.

For nonlinear problems, EnKF can create posterior ensemble with non-physical states.

This can be alleviated by regularization, such as penalization of states with large spatial gradients.

In 2007, Ravela et al. introduce the joint position-amplitude adjustment model using ensembles, and systematically derive a sequential approximation which can be applied to both EnKF and other formulations.

[18] Their method does not make the assumption that amplitudes and position errors are independent or jointly Gaussian, as others do.

The morphing EnKF employs intermediate states, obtained by techniques borrowed from image registration and morphing, instead of linear combinations of states.

In practice they can also be used for nonlinear problems, where the Gaussian assumption may not be satisfied.