Data gathering is considered as the first and essential part in identification terminology, used as the input for the model which is prepared later.
In particular, the parameter estimation and the model validation are integral parts of the system identification.
[3] The early work was dominated by methods based on the Volterra series, which in the discrete time case can be expressed as where u(k), y(k); k = 1, 2, 3, ... are the measured input and output respectively and
In most of these methods the input has to be Gaussian and white which is a severe restriction for many real processes.
[6] These numbers can be reduced by exploiting certain symmetries but the requirements are still excessive irrespective of what algorithm is used for the identification.
[8] The Wiener model is the reverse of this combination so that the linear element occurs before the static nonlinear characteristic.
The correlation methods exploit certain properties of these systems, which means that if specific inputs are used, often white Gaussian noise, the individual elements can be identified one at a time.
This results in manageable data requirements and the individual blocks can sometimes be related to components in the system under study.
This process is called machine learning because the network adjusts the weights so that the output pattern is reproduced.
Dynamic problems involve lagged variables and are more appropriate for system identification and related applications.
Depending on the architecture of the network the training problem can be either nonlinear-in-the-parameters which involves optimisation or linear-in-the-parameters which can be solved using classical approaches.
Neural networks have been applied extensively to system identification problems which involve nonlinear and dynamic relationships.
The training procedure then produces the best static approximation that relates the lagged variables assigned to the input nodes to the output.
Neural networks have several advantages; they are conceptually simple, easy to train and to use, have excellent approximation properties, the concept of local and parallel processing is important and this provides integrity and fault tolerant behaviour.
The nonlinear autoregressive moving average model with exogenous inputs (NARMAX model) can represent a wide class of nonlinear systems,[2] and is defined as where y(k), u(k) and e(k) are the system output, input, and noise sequences respectively;
are the maximum lags for the system output, input and noise; F[•] is some nonlinear function, d is a time delay typically set to d = 1.The model is essentially an expansion of past inputs, outputs and noise terms.
Since NARMAX was introduced, by proving what class of nonlinear systems can be represented by this model, many results and algorithms have been derived based around this description.
While NARMAX started as the name of a model it has now developed into a philosophy of nonlinear system identification,.
Naively proceeding to estimate a model which includes all these terms and then pruning will cause numerical and computational problems and should always be avoided.
Structure detection, which aims to select terms one at a time, is therefore critically important.
These objectives can easily be achieved by using the Orthogonal Least Squares [2] algorithm and its derivatives to select the NARMAX model terms one at a time.
These ideas can also be adapted for pattern recognition and feature selection and provide an alternative to principal component analysis but with the advantage that the features are revealed as basis functions that are easily related back to the original problem.
There are many applications where this approach is appropriate, for example in time series prediction of the weather, stock prices, speech, target tracking, pattern classification etc.
This second aim is why the NARMAX philosophy was developed and is linked to the idea of finding the simplest model structure.
The core aim of this second approach to identification is therefore to identify and reveal the rule that represents the system.
Here the aim is to identify models, often nonlinear, that can be used to understand the basic mechanisms of how these systems operate and behave so that we can manipulate and utilise these.
In a general situation, it might be the case that some exogenous uncertain disturbance passes through the nonlinear dynamics and influence the outputs.
Unfortunately, due to the nonlinear transformation of unobserved random variables, the likelihood function of the outputs is analytically intractable; it is given in terms of a multidimensional marginalization integral.
An alternative solution is to apply the prediction error method using a sub-optimal predictor.
[18][19][20] The resulting estimator can be shown to be strongly consistent and asymptotically normal and can be evaluated using relatively simple algorithms.