OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.
OCRopus is developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and was sponsored by Google.
Recent text recognition is based on recurrent neural networks (LSTM) and does not require a language model.
This makes it possible to train language-independent models for which good recognition results in English, German and French have been shown at the same time.
This extra effort is particularly worthwhile for difficult documents or scripts that are no longer common today, which are not in the focus of other OCR software.
[5][6] On 9 April 2007, OCRopus was announced as a Google-sponsored project to develop advanced OCR technologies.
[1] Funding was granted for a period of three years and covered in particular PhD and postdoctoral positions at DFKI and the University of Kaiserslautern.
From 2013 onwards, an additional recognition with recurrent neural networks (LSTM) was offered, which with the release of version 1.0 in November 2014 is the only recognizer.