Pumping lemma for regular languages

Informally, it says that all sufficiently long strings in a regular language may be pumped—that is, have a middle section of the string repeated an arbitrary number of times—to produce a new string that is also part of the language.

Specifically, the pumping lemma says that for any regular language

{\displaystyle xz,xyz,xyyz,xyyyz,...}

Moreover, the pumping lemma guarantees that the length of

Languages with a finite number of strings vacuously satisfy the pumping lemma by having

The pumping lemma was first proven by Michael Rabin and Dana Scott in 1959,[1] and rediscovered shortly after by Yehoshua Bar-Hillel, Micha A. Perles, and Eli Shamir in 1961, as a simplification of their pumping lemma for context-free languages.

is called the "pumping length")[4] can be written as

can be divided into three substrings), satisfying the following conditions:

is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in

to be pumped must be of length at least one, that is, not an empty string; (2) means the loop must occur within the first

In simple words, for any regular language

Below is a formal expression of the pumping lemma.

The pumping lemma is often used to prove that a particular language is non-regular: a proof by contradiction may consist of exhibiting a string (of the required length) in the language that lacks the property outlined in the pumping lemma.

can be shown to be non-regular as follows: The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same idea.

, there is a string of balanced parentheses that begins with more than

, a string can be produced that does not contain the same number of left and right parentheses, and so they cannot be balanced.

The transitions that take the machine from the first encounter of state

in the lemma, and since the machine will match a string without the

repeated any number of times, the conditions of the lemma are satisfied.

Since the substring bc takes the machine through transitions that start at state

, that portion could be repeated and the FSA would still accept, giving the string abcbcd.

Alternatively, the bc portion could be removed and the FSA would still accept giving the string ad.

In terms of the pumping lemma, the string abcd is broken into an

portion d. As a side remark, the problem of checking whether a given string can be accepted by a given nondeterministic finite automaton without visiting any state repeatedly, is NP hard.

and From this, the above standard version follows a special case, with both

While the pumping lemma states that all regular languages satisfy the conditions described above, the converse of this statement is not true: a language that satisfies these conditions may still be non-regular.

In other words, both the original and the general version of the pumping lemma give a necessary but not sufficient condition for a language to be regular.

with a substring of length 3 including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the string's characters are 3's.

The Myhill–Nerode theorem provides a test that exactly characterizes regular languages.

The typical method for proving that a language is regular is to construct either a finite-state machine or a regular expression for the language.

For every long enough string in a regular language, there must be a middle section (y) that can be repeated (or pumped) any number of times to produce a string still in the language.
Proof idea: Whenever a sufficiently long string xyz is recognized by a finite automaton , it must have reached some state ( ) twice. Hence, after repeating ("pumping") the middle part arbitrarily often ( xyyz , xyyyz , ...) the string will still be recognized.