Speaker recognition

Speaker recognition has a history dating back some four decades as of 2019 and uses the acoustic features of speech that have been found to differ between individuals.

In the verification phase, a speech sample or "utterance" is compared against a previously created voice print.

In addition, the use of shared-secrets (e.g.: passwords and PINs) or knowledge-based information can be employed in order to create a multi-factor authentication scenario.

The various technologies used to process and store voice prints include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation, vector quantization and decision trees.

For comparing utterances against voice prints, more basic methods like cosine similarity are traditionally used for their simplicity and performance.

[citation needed] Ambient noise levels can impede both collections of the initial and subsequent voice samples.

Noise reduction algorithms can be employed to improve accuracy, but incorrect application can have the opposite effect.

Some systems adapt the speaker models after each successful verification to capture such long-term changes in the voice, though there is debate regarding the overall security impact imposed by automated adaptation[citation needed] Due to the introduction of legislation like the General Data Protection Regulation in the European Union and the California Consumer Privacy Act in the United States, there has been much discussion about the use of speaker recognition in the work place.

In September 2019 Irish speech recognition developer Soapbox Labs warned about the legal implications that may be involved.

[14] The first international patent was filed in 1983, coming from the telecommunication research in CSELT[15] (Italy) by Michele Cavazza and Alberto Ciaramella as a basis for both future telco services to final customers and to improve the noise-reduction techniques across the network.

The system used passive speaker recognition to verify the identity of telephone customers within 30 seconds of normal conversation.

[18] Speaker recognition may also be used in criminal investigations, such as those of the 2014 executions of, amongst others, James Foley and Steven Sotloff.