Its study combines the pursuit of finding ideal algorithms that maximize rewards with a more sociological set of concepts.
While research in single-agent reinforcement learning is concerned with finding the algorithm that gets the biggest number of points for one agent, research in multi-agent reinforcement learning evaluates and quantifies social metrics, such as cooperation,[2] reciprocity,[3] equity,[4] social influence,[5] language[6] and discrimination.
There's no prospect of communication or social dilemmas, as neither agent is incentivized to take actions that benefit its opponent.
The Deep Blue[8] and AlphaGo projects demonstrate how to optimize the performance of agents in pure competition settings.
[10] In pure cooperation settings all the agents get identical rewards, which means that social dilemmas do not occur.
[24] Various techniques have been explored in order to induce cooperation in agents: Modifying the environment rules,[25] adding intrinsic rewards,[4] and more.
Agents take multiple actions over time, and the distinction between cooperating and defecting is not as clear cut as in matrix games.
There is ongoing research into defining different kinds of SSDs and showing cooperative behavior in the agents that act in them.
Autocurricula in reinforcement learning experiments are compared to the stages of the evolution of life on Earth and the development of human culture.
A major stage in evolution happened 2-3 billion years ago, when photosynthesizing life forms started to produce massive amounts of oxygen, changing the balance of gases in the atmosphere.
[30] In the next stages of evolution, oxygen-breathing life forms evolved, eventually leading up to land mammals and human beings.
[47] The environment is not stationary anymore, thus the Markov property is violated: transitions and rewards do not only depend on the current state of an agent.