Tatoeba

Tatoeba is a free collection of example sentences with translations geared towards foreign language learners.

In 2006, Trang Ho was frustrated that unlike some of their Japanese counterparts, German bilingual dictionaries didn't feature full-text search of usage examples with translations.

Alongside her studies at the University of Technology of Compiègne, Trang Ho gradually improved her website with a few classmates.

[4] In December 2008, Trang Ho released the first version of the current codebase built around a more flexible data model.

Together with Trang Ho and other young developers, they made Tatoeba more social: sentence lists, user profiles, private messaging, and Facebook-inspired Wall.

Over the 2018-2020 period, support from the Mozilla Foundation as part of the Common Voice project allowed Tatoeba to make its platform more open and user-friendly.

[22] Visitors can download tab-delimited sentence pairs ready for import into Anki and similar Spaced Repetition Software at the Tatoeba website.

[25] GoodExample tries to automatically extract a diverse set of high-quality example sentences from the English Tatoeba Corpus.

[26] Tatoeba datasets can power incidental learning experiences that blend the acquisition of a foreign language with the user's everyday activities like web browsing or book reading.

[27][28] A team at MIT Media Lab used example sentences from Tatoeba in WordSense, a mixed reality platform that enables "serendipitous language learning in the wild.

"[29] More recently, Japanese researchers implemented a Tatoeba search feature in an integrated writing assistance environment.

Charles Kelly and Paul Raine, both EFL teachers in Japan, have developed language learning activities based on sentences curated from the Tatoeba Corpus.

[34][35] Clozemaster is a language self-study program that generates gamified cloze tests from Tatoeba sentence pairs.

[44][45] With the rise of deep learning, researchers increasingly use Tatoeba's data sets to train and evaluate their massively multilingual models in tasks like machine translation,[46] language identification,[47] semantic search,[48] and speech recognition.

A simplified diagram of Tatoeba's underlying data structure.
Research articles about machine translation that mention Tatoeba [ 41 ]