Induction of regular languages

However, (0+1)* and 1+(1⋅0)+(1⋅0⋅0) is another regular expression, denoting the largest (assuming Σ = {0,1}) and the smallest set containing the given strings, and called the trivial overgeneralization and undergeneralization, respectively.

Mapping[note 2] each equivalence E to the corresponding quotient automaton language L(Aa,b,c,d / E) obtains the partially ordered set shown in the picture.

[1] In the picture, separation borders are shown for the negative example strings 11 (green), 1001 (blue), 101 (cyan), and 0 (red).

Coste and Nicolas present an own search method within the lattice, which they relate to Mitchell's version space paradigm.

To find the separation border, they use a graph coloring algorithm on the state inequality relation induced by the negative examples.

[13][14] Chomsky and Miller (1957)[15] used the pumping lemma: they guess a part v of an input string uvw and try to build a corresponding cycle into the automaton to be learned; using membership queries they ask, for appropriate k, which of the strings uw, uvvw, uvvvw, ..., uvkw also belongs to the language to be learned, thereby refining the structure of their automaton.

For two strings x and y, Câmpeanu et al. define x ~ y if xz ∈ F ⇔ yz ∈ F for all strings z of a length such that both xz and yz are not longer than l.[17] Based on this relation, whose lack of transitivity[note 3] causes considerable technical problems, they give an O(n4)[note 4] algorithm to construct from F a cover automaton A of minimal state count.

Moreover, they show that residual automata for regular languages cannot be learned in polynomial time, even assuming optimal sample inputs.

[27] This result was further generalised, and an algorithm that outputs an AFA (alternating finite automata) termed AL* was devised.

[29] The L* algorithm and its generalizations have significant implications in the field of automata theory and formal language learning, as they demonstrate the feasibility of efficiently learning more expressive automata models, such as NFA and AFA, which can represent languages more concisely and capture more complex patterns compared to traditional DFAs.

Brill defines a reduced regular expression to be any of Given an input set of strings, he builds step by step a tree with each branch labelled by a reduced regular expression accepting a prefix of some input strings, and each node labelled with the set of lengths of accepted prefixes.

He aims at learning correction rules for English spelling errors,[note 5] rather than at theoretical considerations about learnability of language classes.

Partial order of automata generating the strings 1, 10, and 100 (positive examples). For each of the negative example strings 11, 1001, 101, and 0, the upper set of automata generating it is shown. The only automata that generate all of {1, 10, 100} but none of {11, 1001, 101, 0} are the trivial bottom automaton and the one corresponding to the regular expression 1⋅0 * .
Illustration of the pumping lemma for regular automata
Brzozowski derivative (on red background) of a dictionary string set with respect to " con "