Categorial grammar

Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments.

Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by Yehoshua Bar-Hillel and Joachim Lambek.

It has the advantage that the type inference rules can be fixed once and for all, so that the specification of a particular language grammar is entirely determined by the lexicon.

A categorial grammar shares some features with the simply typed lambda calculus.

Think of these as purely formal expressions freely generated from the primitive types; any semantics will be added later.

Now "the" and "that" are determiners, "boy" and "mess" are nouns, "bad" is an adjective, and "made" is a transitive verb, so the lexicon is {

means that the string is a sentence, while the sequence of reductions shows that it can be parsed as ((the (bad boy)) (made (that mess))).

Unlike CFGs, categorial grammars are lexicalized, meaning that only a small number of (mostly language-independent) rules are employed, and all other syntactic phenomena derive from the lexical entries of specific words.

With some modifications to handle intensionality and quantification, this approach can be used to cover a wide variety of semantic phenomena.

A Lambek grammar is an elaboration of this idea that has a concatenation operator for types, and several other inference rules.

The Lambek calculus consists of several deduction rules, which specify how type inclusion assertions can be derived.

where From the point of view of categorial grammars, a context-free grammar can be seen as a calculus with a set of special purpose axioms for each language, but with no type construction operators and no inference rules except Cut.

Of course, this is not a basic categorial grammar, since it has special axioms that depend upon the language; i.e. it is not lexicalized.

, that is, the right side of the production is a single terminal symbol followed by zero or more (non-terminal) variables.

Now given a CFG in Greibach normal form, define a basic categorial grammar with a primitive type for each non-terminal variable

It is fairly easy to see that this basic categorial grammar generates the same language as the original CFG.

Note that the lexicon of this grammar will generally assign multiple types to each symbol.

To show the converse, that every language generated by a Lambek grammar is context-free, is much more difficult.

In this article this convention is reversed for consistency with the notation of context-free grammars, where the single non-terminal symbol is always on the left.

Some authors use an arrow, which unfortunately may point in either direction, depending on whether the grammar is thought of as generating or recognizing the language.

The basic ideas of categorial grammar date from work by Kazimierz Ajdukiewicz (in 1935) and other scholars from the Polish tradition of mathematical logic including Stanisław Leśniewski, Emil Post and Alfred Tarski.

It represents a development in the historical idea of universal logical grammar as an underlying structure of all languages.

A core concept of the approach is the substitutability of syntactic categories—hence the name categorial grammar.

The membership of an element (e.g., word or phrase) in a syntactic category (word class, phrase type) is established by the commutation test, and the formal grammar is constructed through series of such tests.

[2] Montague's work helped to bolster interest in categorial grammar by associating it with his highly successful formal treatment of natural language semantics.

One formalism that has received considerable attention in recent years is Steedman and Szabolcsi's combinatory categorial grammar, which builds on combinatory logic invented by Moses Schönfinkel and Haskell Curry.

[3] A variety of changes to categorial grammar have been proposed to improve syntactic coverage.

Function composition is important in categorial accounts of conjunction and extraction, especially as they relate to phenomena like right node raising.

Conjunction can generally be applied to nonstandard constituents resulting from type raising or function composition..

The grammar is extended to handle linguistic phenomena such as discontinuous idioms, gapping and extraction.