Apprenticeship learning

[1][2] It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher.

[2] Inverse reinforcement learning (IRL) is the process of deriving a reward function from observed behavior.

[8][9] Apprenticeship via inverse reinforcement learning (AIRP) was developed by in 2004 Pieter Abbeel, Professor in Berkeley's EECS department, and Andrew Ng, Associate Professor in Stanford University's Computer Science Department.

AIRP deals with "Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform".

Take the task of driving for example, there are many different objectives working simultaneously - such as maintaining safe following distance, a good speed, not changing lanes too often, etc.

This task, may seem easy at first glance, but a trivial reward function may not converge to the policy wanted.

While simple trajectories can be intuitively derived, complicated tasks like aerobatics for shows has been successful.

And indeed, if the software works, the Human operator takes the robot-arm, makes a move with it, and the robot will reproduce the action later.

Other authors call the principle “steering behavior”,[14] because the aim is to bring a robot to a given line.