It will fall into the range of reinforcement learning if the environment is stochastic and a Markov decision process (MDP) is used.
Research in learning automata can be traced back to the work of Michael Lvovitch Tsetlin in the early 1960s in the Soviet Union.
[1] At each time step t=0,1,2,3,..., the automaton reads an input from its environment, updates p(t) to p(t+1) by A, randomly chooses a successor state according to the probabilities p(t+1) and outputs the corresponding action.
More generally, a "Q-model" allows an arbitrary finite input set X, and an "S-model" uses the interval [0,1] of real numbers as X.
[2] A visualised demo[3][4]/ Art Work of a single Learning Automaton had been developed by μSystems (microSystems) Research Group at Newcastle University.