[1] It is based on a variational principle of least action, formulated in generalized coordinates of motion.
Generalized filtering furnishes posterior densities over hidden states (and parameters) generating observed data using a generalized gradient descent on variational free energy, under the Laplace assumption.
Furthermore, it operates online, assimilating data to approximate the posterior density over unknown quantities, without the need for a backward pass.
Special cases include variational filtering,[3] dynamic expectation maximization[4] and generalized predictive coding.
The objective is to approximate the posterior density over hidden and control states, given sensor states and a generative model – and estimate the (path integral of) model evidence
This generally involves an intractable marginalization over hidden states, so model evidence (or marginal likelihood) is replaced with a variational free energy bound.
the variational density is Gaussian and the precision that minimizes free energy is
Generalized filtering is based on the following lemma: The self-consistent solution to
satisfies the variational principle of stationary action, where action is the path integral of variational free energy Proof: self-consistency requires the motion of the mean to be the mean of the motion and (by the fundamental lemma of variational calculus) Put simply, small perturbations to the path of the mean do not change variational free energy and it has the least action of all possible (local) paths.
Remarks: Heuristically, generalized filtering performs a gradient descent on variational free energy in a moving frame of reference:
For a related example in statistical physics, see Kerr and Graham [8] who use ensemble dynamics in generalized coordinates to provide a generalized phase-space version of Langevin and associated Fokker-Planck equations.
In practice, generalized filtering uses local linearization[9] over intervals
Usually, the generative density or model is specified in terms of a nonlinear input-state-output model with continuous nonlinear functions: The corresponding generalized model (under local linearity assumptions) obtains the from the chain rule Gaussian assumptions about the random fluctuations
then prescribe the likelihood and empirical priors on the motion of hidden states The covariances
This is a ubiquitous measure of roughness in the theory of stochastic processes.
[10] Crucially, the precision (inverse variance) of high order derivatives fall to zero fairly quickly, which means it is only necessary to model relatively low order generalized motion (usually between two and eight) for any given or parameterized autocorrelation function.
observations, the implicit sampling is treated as part of the generative process, where (using Taylor's theorem) In principle, the entire sequence could be used to estimate hidden variables at each point in time.
This allows the scheme to assimilate data online, using local observations around each time point (typically between two and eight).
minimizes variational free energy, when the motion of the mean is small.
It is straightforward to show that this solution corresponds to a classical Newton update.
[11] Classical filtering under Markovian or Wiener assumptions is equivalent to assuming the precision of the motion of random fluctuations is zero.
[12] Particle filtering is a sampling-based scheme that relaxes assumptions about the form of the variational or approximate posterior density.
[3] In variational filtering, an ensemble of particles diffuse over the free energy landscape in a frame of reference that moves with the expected (generalized) motion of the ensemble.
This provides a relatively simple scheme that eschews Gaussian (unimodal) assumptions.
In generalized filtering, this leads to dynamic expectation maximisation.
[4] that comprises a D-step that optimizes the sufficient statistics of unknown states, an E-step for parameters and an M-step for precisions.
Generalized filtering is usually used to invert hierarchical models of the following form The ensuing generalized gradient descent on free energy can then be expressed compactly in terms of prediction errors, where (omitting high order terms): Here,
Generalized filtering has been primarily applied to biological timeseries—in particular functional magnetic resonance imaging and electrophysiological data.
This is usually in the context of dynamic causal modelling to make inferences about the underlying architectures of (neuronal) systems generating data.
[13] It is also used to simulate inference in terms of generalized (hierarchical) predictive coding in the brain.