Voice activity detection

Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost.

A VAD operating in a mobile phone must be able to detect speech in the presence of a range of very diverse types of acoustic background noise.

The biggest difficulty in the detection of speech in this environment is the very low signal-to-noise ratios (SNRs) that are encountered.

However, the improvement depends mainly on the percentage of pauses during speech and the reliability of the VAD used to detect these intervals.

On the other hand, clipping, that is the loss of milliseconds of active speech, should be minimized to preserve quality.

It is therefore important to carry out subjective tests on VADs, the main aim of which is to ensure that the clipping perceived is acceptable.

In VoIP applications, front-end clipping can be reduced by rewinding to shortly before the detection and sending very slightly delayed data.

As they require the participation of several people for a few days, increasing cost, they are generally only used when a proposal is about to be standardized.