A combination of structural factors, the emergence of nation-states in Europe, the Industrial Revolution and the expansion of colonization entailed the global use of three European national languages: French, German and English.
Research in the Soviet Union rapidly expanded in the years following the Second World War, and access to Russian journals became a major policy issue in the United States, prompting the early development of machine translation.
It became a language of science "through its encounter with Arabic"; during the Renaissance of the 12th century, a large corpus of Arabian scholarly texts was translated into Latin, in order for it to be available in the emerging network of European universities and centers of knowledge.
In the 1860s and 1870s, Russian researchers in chemistry and other physical sciences ceased to publish in German in favor of local periodicals, following a major work of adaptation and creation of names for scientific concepts or elements (such as chemical compounds).
In 1924, the linguist Roland Grubb Kent underlined that scientific communication could be significantly disrupted in the near future by the use of as many as "twenty" languages of science: Today with the recrudescence of certain minor linguistic units and the increased nationalistic spirit of certain larger ones, we face a time when scientific publications of value may appear in perhaps twenty languages [and] be facing an era in which important publications will appear in Finnish, Lithuanian, Hungarian, Serbian, Irish, Turkish, Hebrew, Arabic, Hindustani, Japanese, Chinese.
[25] The development of a specialized technical vocabulary was a challenging task, as the extensive system of derivation of Esperanto made it complicated to import directly words commonly used in German, French or English scientific publications.
While it was framed as a compromise between the esperantist and the anti-esperantist factions, this decision ultimately disappointed all the proponents of an international medium for scientific communication and durably harmed the adoption of constructed languages in academic circles.
[30] German never recovered its privileged status as a leading language of science in the United States, and due to the lack of alternatives beyond French, American education became "increasingly monoglot" and isolationist.
[31] Not affected by international boycott, the use of French reached "a plateau between the 1920s and 1940s": while it did not decline, neither did it profit from the marginalization of German, but instead decreased relative to the expansion of English.
This ongoing anxiety became an overt crisis after the successful launch of Sputnik in 1958, as the decentralized American research system seemed for a time outpaced by the efficiency of Soviet planning.
Research in this area emerged very precociously[clarification needed]: automated translation appeared as a natural extension of the initial purpose of the first computers: code-breaking.
[36] Despite the initial reluctance of leading figures in computing like Norbert Wiener, several well-connected science administrators in the US, like Warren Weaver and Léon Dostert, set up a series of major conferences and experiments in the nascent field, out of a concern that "translation was vital to national security".
[36] On January 7, 1954, Dostert coordinated the Georgetown–IBM experiment, which aimed to demonstrate that the technique was sufficiently mature despite the significant shortcomings of the computing infrastructure of the time: some sentences from Russian scientific articles were automatically translated using a dictionary of 250 words and six basic syntax rules.
[38] In 1956, Léon Dostert secured a large funding with the support of the CIA and had enough resources to overcome the technical limitations of existing computing infrastructure: in 1957, automated translation from Russian to English could run on a vastly expanded dictionary of 24,000 words and rely on hundreds of predefined syntax rules.
On June 11, 1965, President Lyndon B. Johnson acted that the English language has become a lingua franca that opened "doors to scientific and technical knowledge" and whose promotion should be a "major policy" of the United States.
[44] In the European Union, the Bologna Declaration of 1999 "obliged universities throughout Europe and beyond to align their systems with that of the United Kingdom" and created strong incentives to publish academic results in English.
Research in this area was still pursued in a few countries where bilingualism was an important political and cultural issue: in Canada, a METEO system was successfully set up to "translate weather forecasts from English into French".
Commercial databases "now wield on the international stage is considerable and works very much in favor of English" as they provide a wide range of indicators of research quality.
Actors like Elsevier or Springer are increasingly able to control "all aspects of the research lifecycle, from submission to publication and beyond"[62] Due to this vertical integration, commercial metrics are no longer restricted to journal article metadata but can include a wide range of individual and social data extracted among scientific communities.
[64] For Ulrich Ammon the predominance of English has created a hierarchy and a "central-peripheral dimension" within the global scientific publication landscape, that affects negatively the reception of research published in a non-English language.
[69] Empirical studies of the use of languages in scientific publications have long been constrained by structural bias in the most readily accessible sources: commercial databases like the Web of Science.
[76] While German has been outpaced by English even in Germanic-speaking countries since the Second World War, it has also continued to be used marginally as a vehicular scientific language in specific disciplines or research fields (the Nischenfächer or "niche-disciplines").
[83] In the Portuguese research communities, there have been a steep rise of Portuguese-language papers during the 2007-2018 period in commercial indexes which is both indicative of remaining "spaces of resilience and contestation of some hegemonic practices" and of a potential new paradigm of scientific publishing "steered towards plurilingual diversity".
[85] In 2022, Bianca Kramer and Cameron Neylon have led a large scale analysis of the metadata available for 122 millions of Crossref objects indexed by a DOI.
Developments in this area were slowed after 1965, due to the increasing domination of English, the limitations of the computing infrastructure, and the shortcomings of the leading approach, rule-based machine translation.
Research in the field has largely been focused on English and a few major European languages: "While we live in a multilingual word, this is paradoxically not taken into account by machine translation".
[102] A landscape study of SPARC in 2021 shows that European open science infrastructures "provide access to a range of language content of local and international significance.
In the 2010s, quantitative studies have started to highlight the positive impact of local languages on the reuse of open access resources in varied national contexts such as Finland,[108] Québec,[109] Croatia[110] or Mexico.
A study of the Finnish platform Journal.fi shows that the audience of Finnish-speaking articles is significantly more diverse: "in case of the national language publications students (42%) are clearly the largest group, and besides researchers (25%), also private citizens (12%) and other experts (11%)".
In 2015, Juan Pablo Alperin introduced a systematic measure of social impact that highlighted the relevancy of scientific content for local communities : "By looking at a broad range of indicators of impact and reach, far beyond the typical measures of one article citing another, I argue, it is possible to gain a sense of the people that are using Latin American research, thereby opening the door for others to see the ways in which it has touched those individuals and communities.