Baum–Welch algorithm

The Baum–Welch algorithm, the primary method for inference in hidden Markov models, is numerically unstable due to its recursive calculation of joint probabilities.

As the number of variables grows, these joint probabilities become increasingly small, leading to the forward recursions rapidly approaching values below machine precision.

The algorithm and the Hidden Markov models were first described in a series of articles by Baum and his peers at the IDA Center for Communications Research, Princeton in the late 1960s and early 1970s.

, which leads to the definition of the time-independent stochastic transition matrix The initial state distribution (i.e. when

They can also be set using prior information about the parameters if it is available; this can speed up the algorithm and also steer it toward the desired local maximum.

This is found recursively: Since this series converges exponentially to zero, the algorithm will numerically underflow for longer sequences.

as, We can now calculate the temporary variables, according to Bayes' theorem: which is the probability of being in state

This is equivalent to the number of times state i is observed in the sequence from t = 1 to t = T − 1. where is an indicator function, and

, the parameters can now be updated: where is an indicator function Suppose we have a chicken from which we collect eggs at noon every day.

Now whether or not the chicken has laid eggs for collection depends on some unknown factors that are hidden.

This allows us to calculate the emission matrix as described above in the algorithm, by adding up the probabilities for the respective observed sequences.

To estimate the initial probabilities we assume all sequences start with the hidden state ⁠

Finally we repeat these steps until the resulting probabilities converge satisfactorily.

Hidden Markov Models were first applied to speech recognition by James K. Baker in 1975.

[10] Continuous speech recognition occurs by the following steps, modeled by a HMM.

Similar to the lexicon decoding, the system path is further constrained by the rules of grammar and syntax.

Finally, semantic analysis is applied and the system outputs the recognized utterance.

A limitation of many HMM applications to speech recognition is that the current state only depends on the state at the previous time-step, which is unrealistic for speech as dependencies are often several time-steps in duration.

[11] The Baum–Welch algorithm also has extensive applications in solving HMMs used in the field of speech synthesis.

[12] The Baum–Welch algorithm is often used to estimate the parameters of HMMs in deciphering hidden or noisy information and consequently is often used in cryptanalysis.

[13] HMMs and as a consequence the Baum–Welch algorithm have also been used to identify spoken phrases in encrypted VoIP calls.

[14] In addition HMM cryptanalysis is an important tool for automated investigations of cache-timing data.

It allows for the automatic discovery of critical algorithm state, for example key values.

[15] The GLIMMER (Gene Locator and Interpolated Markov ModelER) software was an early gene-finding program used for the identification of coding regions in prokaryotic DNA.

[16][17] GLIMMER uses Interpolated Markov Models (IMMs) to identify the coding regions and distinguish them from the noncoding DNA.

The latest release (GLIMMER3) has been shown to have increased specificity and accuracy compared with its predecessors with regard to predicting translation initiation sites, demonstrating an average 99% accuracy in locating 3' locations compared to confirmed genes in prokaryotes.

[18] The GENSCAN webserver is a gene locator capable of analyzing eukaryotic sequences up to one million base-pairs (1 Mbp) long.

[19] GENSCAN utilizes a general inhomogeneous, three periodic, fifth order Markov model of DNA coding regions.

[20] GENSCAN was shown to exactly predict exon location with 90% accuracy with 80% specificity compared to an annotated database.

Solving this model using Baum-Welch demonstrated the ability to predict the location of CNV breakpoint to approximately 300 bp from micro-array experiments.