LipNet

LipNet is a deep neural network for audio-visual speech recognition (ASVR).

It was created by University of Oxford researchers Yannis Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas.

The technique, outlined in a paper in November 2016,[1] is able to decode text from the movement of a speaker's mouth.

Traditional visual speech recognition approaches separated the problem into two stages: designing or learning visual features, and prediction.

[2] Audio-visual speech recognition has enormous practical potential, with applications such as improved hearing aids, improving the recovery and wellbeing of critically ill patients,[3] and speech recognition in noisy environments,[4] implemented for example in Nvidia's autonomous vehicles.