Acoustic model

These two models are combined to get the top-ranked word sequences corresponding to a given audio segment.

The raw audio signal from each frame can be transformed by applying the mel-frequency cepstrum.

Recently, the use of Convolutional Neural Networks has led to big improvements in acoustic modeling.

For speech recognition on a standard desktop PC, the limiting factor is the sound card.

As a general rule, a speech recognition engine works better with acoustic models trained with speech audio data recorded at higher sampling rates/bits per sample.