Star height

More formally, the star height of a regular expression E over a finite alphabet A is inductively defined as follows: Here,

For our example, this can be done by an indirect proof: One proves that a language of star height 0 contains only finitely many words.

The star height of a group language is computable: for example, the star height of the language over {a,b} in which the number of occurrences of a and b are congruent modulo 2n is n.[1] In his seminal study of the star height of regular languages, Eggan (1963) established a relation between the theories of regular expressions, finite automata, and of directed graphs.

In graph theory, the cycle rank r(G) of a directed graph (digraph) G = (V, E) is inductively defined as follows: In automata theory, a nondeterministic finite automaton with ε-transitions (ε-NFA) is defined as a 5-tuple, (Q, Σ, δ, q0, F), consisting of A word w ∈ Σ* is accepted by the ε-NFA if there exists a directed path from the initial state q0 to some final state in F using edges from δ, such that the concatenation of all labels visited along the path yields the word w. The set of all words over Σ* accepted by the automaton is the language accepted by the automaton A.

The above definition assumes that regular expressions are built from the elements of the alphabet A using only the standard operators set union, concatenation, and Kleene star.

It can be shown that a language L is star-free if and only if its syntactic monoid is aperiodic (Schützenberger (1965)).

Example automaton of cycle rank 1. Kleene's algorithm transforms it into the regular expression a * b * ba (( a | b ) b * a |ε) * ( a | b ) b * | a * b * b , which has star-height 2. By Eggan's theorem, an equivalent regular expression of star-height ≤1 must exist. In fact, a * b ( b | a ( a | b )) * describes the same language.