Statistical machine translation

Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural machine translation.

One approach which lends itself well to computer implementation is to apply Bayes Theorem, that is

is done by picking up the one that gives the highest probability: For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings

Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality.

This trade-off between quality and time usage can also be found in speech recognition.

For example, the English word corner can be translated in Spanish by either rincón or esquina, depending on whether it is to mean its internal or external angle.

[7] The word-based translation is not widely used today; phrase-based systems are more common.

[11] And matching words in bi-text is still a problem actively discussed in the community.

These are typically not linguistic phrases, but phrasemes that were found using statistical methods from corpora.

This table could be learnt based on word-alignment, or directly from a parallel corpus.

[15] Until the 1990s, with advent of strong stochastic parsers, the statistical counterpart of the old idea of syntax-based translation did not take off.

Examples of this approach include DOP-based MT and later synchronous context-free grammars.

It is a function that takes a translated sentence and returns the probability of it being said by a native speaker.

There are even languages that use writing systems without clear indication of a sentence end, such as Thai.

Efficient search and retrieval of the highest scoring sentence alignment is possible through this and other mathematical models.

To learn e.g. the translation model, however, we need to know which words align in a source-target sentence pair.

Function words that have no clear equivalent in the target language are another issue for the statistical models.

Depending on the corpora used, the use of idiom and linguistic register might not receive a translation that accurately represents the original intent.

[19] This problem is connected with word alignment, as in very specific contexts the idiomatic expression aligned with words that resulted in an idiomatic expression of the same meaning in the target language.

For that reason, idioms could only be subjected to phrasal alignment, as they could not be decomposed further without losing their meaning.

Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages.

Attempts at solutions have included re-ordering models, where a distribution of location changes for each item of translation is guessed from aligned bi-text.

This might be because of the lack of training data, changes in the human domain where the system is used, or differences in morphology.