DFA minimization

Here, two DFAs are called equivalent if they recognize the same regular language.

Several different algorithms accomplishing this task are known and described in standard textbooks on automata theory.

[1] For each regular language, there also exists a minimal automaton that accepts it, that is, a DFA with a minimum number of states and this DFA is unique (except that states can be given different names).

There are three classes of states that can be removed or merged from the original DFA without affecting the language it accepts.

Unreachable states can be removed from the DFA without affecting the language that it accepts.

The following algorithms present various approaches to merging nondistinguishable states.

One algorithm for merging the nondistinguishable states of a DFA, due to Hopcroft (1971), is based on partition refinement, partitioning the DFA states into groups by their behavior.

[5][6] The algorithm starts with a partition that is too coarse: every pair of states that are equivalent according to the Nerode congruence belong to the same set in the partition, but pairs that are inequivalent might also belong to the same set.

It gradually refines the partition into a larger number of smaller sets, at each step splitting sets of states into pairs of subsets that are necessarily inequivalent.

The algorithm then repeatedly chooses a set A from the current partition and an input symbol c, and splits each of the sets of the partition into two (possibly empty) subsets: the subset of states that lead to A on input symbol c, and the subset of states that do not lead to A.

The purpose of the outermost if statement (if Y is in W) is to patch up W, the set of distinguishers.

If Y is in W, it has just become obsolete as a means to split classes in future iterations.

Choosing the smaller of the two splits guarantees that the new addition to W is no more than half the size of Y; this is the core of the Hopcroft algorithm: how it gets its speed, as explained in the next paragraph.

The worst case running time of this algorithm is O(ns log n), where n is the number of states and s is the size of the alphabet.

The partition refinement data structure allows each splitting step to be performed in time proportional to the number of transitions that participate in it.

[6] Once Hopcroft's algorithm has been used to group the states of the input DFA into equivalence classes, the minimum DFA can be constructed by forming one state for each equivalence class.

Like Hopcroft's algorithm, it maintains a partition that starts off separating the accepting from the rejecting states, and repeatedly refines the partition until no more refinements can be made.

The algorithm terminates when this replacement does not change the current partition.

Its worst-case time complexity is O(n2s): each step of the algorithm may be performed in time O(ns) using a variant of radix sort to reorder the states so that states in the same set of the new partition are consecutive in the ordering, and there are at most n steps since each one but the last increases the number of sets in the partition.

The instances of the DFA minimization problem that cause the worst-case behavior are the same as for Hopcroft's algorithm.

[6][9] Reversing the transitions of a non-deterministic finite automaton (NFA)

and switching initial and final states[note 1] produces an NFA

As Brzozowski (1963) observed, repeating this reversal and determinization a second time, again keeping only reachable states, produces the minimal DFA for the original language.

The intuition behind the algorithm is this: determinizing the reverse automaton merges states that are nondistinguishable in the original automaton, but may produce several accepting states.

In such case, when we reverse the automaton for the second time, these accepting states become initial, and thus the automaton will not be deterministic due to having multiple initial states.

The worst-case complexity of Brzozowski's algorithm is exponential in the number of states of the input automaton.

In the case of DFA, the exponential explosion can happen during determinization of the reversal of the input automaton;[note 4] in the case of NFA, it can also happen during the initial determinization of the input automaton.

[note 5] However, the algorithm frequently performs better than this worst case would suggest.

[10] While an exhaustive search may minimize an NFA, there is no polynomial-time algorithm to minimize general NFAs unless P = PSPACE, an unsolved conjecture in computational complexity theory that is widely believed to be false.

However, there are methods of NFA minimization that may be more efficient than brute force search.

Equivalent minimal DFA. Nondistinguishable states have been merged into a single one.