Statistical model validation

For example, if a researcher is operating with a very limited set of data, but data they have strong prior assumptions about, they may consider validating the fit of their model by using a Bayesian framework and testing the fit of their model using various prior distributions.

One example of this method is in Figure 1, which shows a polynomial function fit to some data.

If the model fits well on the initial data but has a large error on the validation set, this is a sign of overfitting.

With this in mind, a modern approach is to validate a neural network is to test its performance on domain-shifted data.

When doing a validation, there are three notable causes of potential difficulty, according to the Encyclopedia of Statistical Sciences.

The usual methods for dealing with difficulties in validation include the following: checking the assumptions made in constructing the model; examining the available data and related model outputs; applying expert judgment.

[2] Note that expert judgment commonly requires expertise in the application area.

[5] For some classes of statistical models, specialized methods of performing validation are available.

Estimates of the residuals' distributions can often be obtained by repeatedly running the model, i.e. by using repeated stochastic simulations (employing a pseudorandom number generator for random variables in the model).