Inductive probability

Bayesian inference broadened the application of probability to many situations where a population was not well defined.

Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how patterns in the data may be represented by computer programs, that give shorter representations of the data circa 1964.

Chris Wallace and D. M. Boulton developed minimum message length circa 1968.

Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theory for the Pareto optimal behavior for an Intelligent agent, circa 1998.

At first sight Bayes' theorem appears different from the minimimum message/description length principle.

[3] [4] Overfitting occurs when the model matches the random noise and not the pattern in the data.

Cast in the form of inductive inference, the programs are theories that imply the observation of the bit string x.

A problem arises where an intelligent agent's prior expectations interact with the environment to form a self reinforcing feed back loop.

Processing speed and combinatorial explosion remain the primary limiting factors for artificial intelligence.

Probabilities are subjective and personal estimates of likely outcomes based on past experience and inferences made from the data.

If the intelligent agent does not interact with the environment then the probability will converge over time to the frequency of the event.

The prior probability of any statement is calculated from the number of bits needed to state it.

The primary use of the information approach to probability is to provide estimates of the complexity of statements.

Recall that Occam's razor states that "All things being equal, the simplest theory is the most likely to be correct".

In order to apply this rule, first there needs to be a definition of what "simplest" means.

Later constants may be assigned a probability using the Huffman code based on the number of uses of the function id in all expressions recorded so far.

This law describes the relationship between prior and posterior probabilities when new facts are learnt.

Written as quantities of information Bayes' Theorem becomes, Two statements A and B are said to be independent if knowing the truth of A does not change the probability of B.

Mathematically this is, then Bayes' Theorem reduces to, For a set of mutually exclusive possibilities

is the amount of information needed to represent F without the hypothesis H. The difference is how much the representation of the facts has been compressed by assuming that H is true.

If a full set of mutually exclusive hypothesis that provide evidence is known, a proper estimate may be given for the prior probability

giving, Abductive inference[11][12][13][14] starts with a set of facts F which is a statement (Boolean expression).

Abductive reasoning is of the form, The theory T, also called an explanation of the condition F, is an answer to the ubiquitous factual "why" question.

In deductive logic, generalization is a powerful method of generating new theories that may be true.

The Linnaen classification of living things and objects forms the basis for generalization and specification.

Perceiving the world as a collection of objects appears to be a key aspect of human intelligence.

Inductive logic programming is a means of constructing theory that implies a condition.

Isaac Newton used inductive arguments in constructing his law of universal gravitation.

A theory is a simpler condition that explains (or implies) C. The set of all such theories is called T, extended form of Bayes' theorem may be applied where, To apply Bayes' theorem the following must hold:

So Bayes theorem may be applied as specified giving, Using the implication and condition probability law, the definition of