Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration.
In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data[1] and that the environment being observed is generated by an unknown algorithm.
Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection.
It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science.
[2][3][4] In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data.
Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources).
Solomonoff's induction naturally formalizes Occam's razor[5][6][7][8][9] by assigning larger prior credences to theories that require a shorter algorithmic description.
The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960.
[10] It is a mathematically formalized combination of Occam's razor[5][6][7][8][9] and the Principle of Multiple Explanations.
Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action.
Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism.
In other words, any theory must define a probability distribution over observable data
Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable.
Similarly, the sets of observable data considered by Solomonoff were finite.
Without loss of generality, we can thus consider that any observable data is a finite bit string.
As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions.
Solomonoff's induction then allows to make probabilistic predictions of future data
The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set.
Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity.
In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process.
The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process.
[citation needed] Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards.
A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities,[16] which are kinds of super-recursive algorithms.