Audio-to-video synchronization

Audio-to-video synchronization (AV synchronization, also known as lip sync, or by the lack of it: lip-sync error, lip flap) refers to the relative timing of audio (sound) and video (image) parts during creation, post-production (mixing), transmission, reception and play-back processing.

Audio and video signal processing circuitry exists with significant (and potentially non-constant) delays in television systems.

Some video monitors contain internal user-adjustable audio delays to aid in correction of errors.

Some transmission protocols like RTP require an out-of-band method for synchronizing media streams.

In some RTP systems, each media stream has its own timestamp using an independent clock rate and per-stream randomized starting value.

Unfortunately, the advent of high-definition flat-panel display technologies (LCD, DLP and plasma), which can delay video more than audio, has moved the problem into the viewer's home and beyond the control of the television programming industry alone.

Consumer product companies now offer audio-delay adjustments to compensate for video-delay changes in TVs, soundbars and A/V receivers,[7] and several companies manufacture dedicated digital audio delays made exclusively for lip-sync error correction.

[5][8] The Consumer Electronics Association has published a set of recommendations for how digital television receivers should implement A/V sync.

[9] SMPTE standard ST2064, published in 2015,[10] provides technology to reduce or eliminate lip-sync errors in digital television.

When fingerprints have been generated for a TV program, and the required technology is incorporated, the viewer's television set has the ability to continuously measure and correct lip-sync errors.

[13][14][15][16] The Real-time Transport Protocol clocks media using origination timestamps on an arbitrary timeline.