Learning automaton

It will fall into the range of reinforcement learning if the environment is stochastic and a Markov decision process (MDP) is used.

Research in learning automata can be traced back to the work of Michael Lvovitch Tsetlin in the early 1960s in the Soviet Union.

[1] At each time step t=0,1,2,3,..., the automaton reads an input from its environment, updates p(t) to p(t+1) by A, randomly chooses a successor state according to the probabilities p(t+1) and outputs the corresponding action.

More generally, a "Q-model" allows an arbitrary finite input set X, and an "S-model" uses the interval [0,1] of real numbers as X.

[2] A visualised demo[3][4]/ Art Work of a single Learning Automaton had been developed by μSystems (microSystems) Research Group at Newcastle University.