Word error rate

Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system.

The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity.

Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate.

In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy.

Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser.

This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents.