Volterra series

It differs from the Taylor series in its ability to capture "memory" effects.

[1] It is also used in electrical engineering to model intermodulation distortion in many devices, including power amplifiers[2] and frequency mixers.

[citation needed] Its main advantage lies in its generalizability: it can represent a wide range of systems.

[3][4] Norbert Wiener became interested in this theory in the 1920s due to his contact with Volterra's student Paul Lévy.

Wiener applied his theory of Brownian motion for the integration of Volterra analytic functionals.

The use of the Volterra series for system analysis originated from a restricted 1942 wartime report[5] of Wiener's, who was then a professor of mathematics at MIT.

He used the series to make an approximate analysis of the effect of radar noise in a nonlinear receiver circuit.

A continuous time-invariant system with x(t) as input and y(t) as output can be expanded in the Volterra series as Here the constant term

Sometimes the n-th-order term is divided by n!, a convention which is convenient when taking the output of one Volterra system as the input of another ("cascading").

The causality condition: Since in any physically realizable system the output can only depend on previous values of the input, the kernels

This theorem states that a time-invariant functional relation (satisfying certain very general conditions) can be approximated uniformly and to an arbitrary degree of precision by a sufficiently high finite-order Volterra series.

It is usually taken to be an equicontinuous, uniformly bounded set of functions, which is compact by the Arzelà–Ascoli theorem.

The theorem, however, gives no indication as to how many terms are needed for a good approximation, which is an essential question in applications.

For a causal system with symmetrical kernels we can rewrite the n-th term approximately in triangular form Estimating the Volterra coefficients individually is complicated, since the basis functionals of the Volterra series are correlated.

This leads to the problem of simultaneously solving a set of integral equations for the coefficients.

An important aspect, with respect to which the following methods differ, is whether the orthogonalization of the basis functionals is to be performed over the idealized specification of the input signal (e.g. gaussian, white noise) or over the actual realization of the input (i.e. the pseudo-random, bounded, almost-white version of gaussian white noise, or any other stimulus).

The latter methods, despite their lack of mathematical elegance, have been shown to be more flexible (as arbitrary inputs can be easily accommodated) and precise (due to the effect that the idealized version of the input signal is not always realizable).

This method, developed by Lee and Schetzen, orthogonalizes with respect to the actual mathematical description of the signal, i.e. the projection onto the new basis functionals is based on the knowledge of the moments of the random signal.

in the identification process can lead to a better estimation of lower-order kernel, but can be insufficient to stimulate high-order nonlinearity.

[10] This method was developed by Wray and Green (1994) and utilizes the fact that a simple 2-fully connected layer neural network (i.e., a multilayer perceptron) is computationally equivalent to the Volterra series and therefore contains the kernels hidden in its architecture.

the coefficients of the polynomial expansion of the output function of the hidden nodes, and

It is important to note that this method allows kernel extraction up until the number of input delays in the architecture of the network.

Furthermore, it is vital to carefully construct the size of the network input layer so that it represents the effective memory of the system.

This method and its more efficient version (fast orthogonal algorithm) were invented by Korenberg.

Another advantage is that arbitrary inputs can be used for the orthogonalization and that fewer data points suffice to reach a desired level of accuracy.

Hence, one of its main advantages is the widespread existence of standard tools for solving linear regressions efficiently.

It has some educational value, since it highlights the basic property of Volterra series: linear combination of non-linear basis-functionals.

This method was invented by Franz and Schölkopf[12] and is based on statistical learning theory.

Franz and Schölkopf proposed that the kernel method could essentially replace the Volterra series representation, although noting that the latter is more intuitive.

[13] This method was developed by van Hemmen and coworkers[14] and utilizes Dirac delta functions to sample the Volterra coefficients.