Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy.
[1] Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words.
That is, in English, the subject-verb relationship is "obvious", in that the subject is almost always to the left of the verb, and thus no specific indication of dependency needs to be made.
For free word-order languages, this can no longer hold, and a link between the subject and verb must contain an explicit directional arrow to indicate which of the two words is which.
[11][12] There are rare exceptions, e.g. in Finnish, and even in English; they can be parsed by link-grammar only by introducing more complex and selective connector types to capture these situations.
The fact that the costs are local to the connectors, and are not a global property of the algorithm makes them essentially Markovian in nature.
[13][14][15][16][17][18] The assignment of a log-likelihood to linkages allows link grammar to implement the semantic selection of predicate-argument relationships.
A basic rule file for an SVO language might look like: Thus the English sentence, "The boy painted a picture" would appear as: Similar parses apply for Chinese.
[20] Conversely, a rule file for a null subject SOV language might consist of the following links: And a simple Persian sentence, man nAn xordam (من نان خوردم) 'I ate bread' would look like:[21][22][23] VSO order can be likewise accommodated, such as for Arabic.
[24] In many languages with a concatenative morphology, the stem plays no grammatical role; the grammar is determined by the suffixes.
Thus, in Russian, the sentence 'вверху плыли редкие облачка' might have the parse:[25][26] The subscripts, such as '.vnndpp', are used to indicate the grammatical category.
The primary links: Wd, EI, SIp and Api connect together the suffixes, as, in principle, other stems could appear here, without altering the structure of the sentence.
The Api link indicates the adjective; SIp denotes subject-verb inversion; EI is a modifier.
The Vietnamese language sentence "Bữa tiệc hôm qua là một thành công lớn" - "The party yesterday was a great success" may be parsed as follows:[27]
The link grammar syntax parser is a library for natural language processing written in C. It is available under the LGPL license.
Recent versions include improved sentence coverage, Russian, Persian and Arabic language support, prototypes for German, Hebrew, Lithuanian, Vietnamese and Turkish, and programming API's for Python, Java, Common LISP, AutoIt and OCaml, with 3rd-party bindings for Perl,[31] Ruby[32] and JavaScript node.js.
[34][35] The link-parser program along with rules and word lists for English may be found in standard Linux distributions, e.g., as a Debian package, although many of these are years out of date.
Link grammar has also been employed for information extraction of biomedical texts[40][41] and events described in news articles,[42] as well as experimental machine translation systems from English to German, Turkish, Indonesian.