Translation memory

[1] Usage of TM correlated with text type characterised by technical terms and simple sentence structure (technical, to a lesser degree marketing and financial), computing skills, and repetitiveness of content.

Typical translation memory systems only search for text in the source segment.

Traditionally, translation memories have not been considered appropriate for literary or creative texts, for the simple reason that there is so little repetition in the language used.

However, others find them of value even for non-repetitive texts, because the database resources created have value for concordance searches to determine appropriate usage of terms, for quality assurance (no empty segments), and the simplification of the review process (source and target segment are always displayed together while translators have to work with two documents in a traditional review environment).

According to the latter, translators may adapt their style to the use of TM system in order for these not to contain intratextual references, so that the segments can be better reused in future texts, thus affecting cohesion and readability (O'Hagan 2009).

Consistent empirical evidences (Martín-Mor 2011) show that translators will most likely modify the structure of a multiple clause sentence when working with a text processor rather than with a TM system.

"Text memory" is the basis of the proposed Lisa OSCAR xml:tm standard.

The unique identifiers are remembered during translation so that the target language document is 'exactly' aligned at the text unit level.

1970s is the infancy stage for TM systems in which scholars carried on a preliminary round of exploratory discussions.

In this paper, it has shown the basic concept of the storing system: "The translator might start by issuing a command causing the system to display anything in the store that might be relevant to .... Before going on, he can examine past and future fragments of text that contain similar material".

The idea was incorporated from ALPS (Automated Language Processing Systems) Tools first developed by researcher from Brigham Young University, and at that time the idea of TM systems was mixed with a tool called "Repetitions Processing" which only aimed to find matched strings.

One of the first implementations of TM system appeared in Sadler and Vendelmans' Bilingual Knowledge Bank.

He has defined the bi-text as "a single text in two dimensions" (1988), the source and target texts related by the activity of the translator through translation units which made a similar echoes with Sadler's Bilingual Knowledge Bank.

And in Harris's work he proposed something like TM system without using this name: a database of paired translations, searchable either by individual word, or by "whole translation unit", in the latter case the search being allowed to retrieve similar rather than identical units.

TM technology only became commercially available on a wide scale in the late 1990s, through the efforts made by several engineers and translators.

Much more powerful than first-generation TM systems, they include a linguistic analysis engine, use chunk technology to break down segments into intelligent terminological groups, and automatically generate specific glossaries.

The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data.

This LISA standard, which was revised and republished as ISO 30042, allows for the interchange of terminology data including detailed lexical information.

Universal Terminology eXchange (UTX) format is a standard specifically designed to be used for user dictionaries of machine translation, but it can be used for general, human-readable glossaries.

The purpose of UTX is to accelerate dictionary sharing and reuse by its extremely simple and practical specification.

The ability to specify the segmentation rules that were used in the previous translation may increase the leveraging that can be achieved.

TransWS specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects.

It is intended as a detailed framework for the automation of much of the current localization process by the use of Web Services.

Typically, a PO translation memory system will consist of various separate files in a directory tree structure.