String operations

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming.

This article defines some of these basic terms.

Concatenating with the empty string makes no difference:

A language is a finite or infinite set of strings.

Besides the usual set operations like union, intersection etc., concatenation can be applied to languages: if both

Concatenating any language with the former doesn't make any change:

, the set of all three-digit decimal numbers is obtained as

The set of all decimal numbers of arbitrary length is an example for an infinite language.

A string substitution or simply a substitution is a mapping f that maps characters in Σ to languages (possibly in a different alphabet).

String substitutions may be extended to entire languages as [1] Regular languages are closed under string substitution.

[2] Similarly, context-free languages are closed under string substitution.

[3][note 1] A simple example is the conversion fuc(.)

to uppercase, which may be defined e.g. as follows: For the extension of fuc to strings, we have e.g. For the extension of fuc to languages, we have e.g. A string homomorphism (often referred to simply as a homomorphism in formal language theory) is a string substitution such that each character is replaced by a single string.

while the inverse homomorphic image of a language

Simple single-letter substitution ciphers are examples of (ε-free) string homomorphisms.

The homomorphism guc is not ε-free, since it maps e.g. ‹0› to ε.

A very simple string homomorphism example that maps each character to just a character is the conversion of an EBCDIC-encoded string to ASCII.

It is formally defined by removal of characters from the right hand side: Here

Given a formal language L, its projection is given by The right quotient of a character a from a string s is the truncation of the character a in the string s, from the right hand side.

Thus: The quotient of the empty string may be taken: Similarly, given a subset

, one may define the quotient subset as Left quotients may be defined similarly, with operations taking place on the left of a string.

[citation needed] Hopcroft and Ullman (1979) define the quotient L1/L2 of the languages L1 and L2 over the same alphabet as L1/L2 = { s | ∃t∈L2.

[7] This is not a generalization of the above definition, since, for a string s and distinct characters a, b, Hopcroft's and Ullman's definition implies yielding {}, rather than { ε }.

The left quotient (when defined similar to Hopcroft and Ullman 1979) of a singleton language L1 and an arbitrary language L2 is known as Brzozowski derivative; if L2 is represented by a regular expression, so can be the left quotient.

In the case that M is the monoid of words over some alphabet, S is then a regular language, that is, a language that can be recognized by a finite-state automaton.

This is discussed in greater detail in the article on syntactic monoids.

[citation needed] The right cancellation of a character a from a string s is the removal of the first occurrence of the character a in the string s, starting from the right hand side.

and is recursively defined as The empty string is always cancellable: Clearly, right cancellation and projection commute: The prefixes of a string is the set of all prefixes to a string, with respect to a given language: where

The prefix closure of a language is Example:

This relation is a particular example of a prefix order.