Model selection

However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection.

Relatedly, Cox (2006, p. 197) has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".

For example, when Galileo performed his inclined plane experiments, he demonstrated that the motion of the balls fitted the parabola predicted by his model [citation needed].

Burnham & Anderson (2002) emphasize throughout their book the importance of choosing models based on sound scientific principles, such as understanding of the phenomenological processes or mechanisms (e.g., chemical reactions) underlying the data.

Goodness of fit is generally determined using a likelihood ratio approach, or an approximation of this, leading to a chi-squared test.

One is for scientific discovery, also called statistical inference, understanding of the underlying data-generating mechanism and interpretation of the nature of the data.

[3] The first direction is to identify the best model for the data, which will preferably provide a reliable characterization of the sources of uncertainty for scientific interpretation.

For the latter, however, the selected model may simply be the lucky winner among a few close competitors, yet the predictive performance can still be the best possible.

Among these criteria, cross-validation is typically the most accurate, and computationally the most expensive, for supervised learning problems.

The scientific observation cycle.