AIXI

AIXI /ˈaɪksi/ is a theoretical mathematical formalism for artificial general intelligence.

It combines Solomonoff induction with sequential decision theory.

It maximizes the expected total rewards received from the environment.

Intuitively, it simultaneously considers every computable hypothesis (or environment).

The promised rewards are then weighted by the subjective belief that this program constitutes the true environment.

AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.

AIXI can stand for AI based on Solomonoff's distribution, denoted by

(which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I).

[3] AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment

is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms).

Note again that this probability distribution is unknown to the AIXI agent.

is computable, that is, the observations and rewards received by the agent from the environment

can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent.

, that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy

, defined as follows:[3] or, using parentheses, to disambiguate the precedences Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to

) consistent with the agent's past (that is, the previously executed actions,

[4] Let us break this definition down in order to attempt to fully understand it.

, so AIXI needs to look into the future to choose its action at time step

is thus used to "simulate" or compute the environment responses or percepts, given the program

Note that, in general, the program which "models" the current and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown.

should be interpreted as a mixture (in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity

is the sequence of actions already executed in the environment by the AIXI agent.

Let us now put all these components together in order to understand this equation or definition.

The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen.

AIXI's performance is measured by the expected total number of rewards it receives.

It is restricted to maximizing rewards based on percepts as opposed to external states.

It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified.

Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with.

One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent.

[2] Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for Monte Carlo AIXI FAC-Context-Tree Weighting), which has had some success playing simple games such as partially observable Pac-Man.