T5 (Text-to-Text Transfer Transformer) is a series of large language models developed by Google AI introduced in 2019.
T5 models have been employed in various applications, including chatbots, machine translation systems, text summarization tools, code generation, and robotics.
[4] The original T5 models are pre-trained on the Colossal Clean Crawled Corpus (C4), containing text and code scraped from the internet.
This pre-training process enables the models to learn general language understanding and generation abilities.
T5 models can then be fine-tuned on specific downstream tasks, adapting their knowledge to perform well in various applications.
It was trained on a mixture of English, German, French, and Romanian data from the C4 dataset, at a ratio of 10:1:1:1.
An exhaustive list of the variants released by Google Brain is on the GitHub repo for T5X.