A Proof of the CFL Pumping Lemma using Chomsky Normal Form

Theorem (The CFL Pumping Lemma)

Let $\QTR{Large}{A}$ is a context-free language. There is a number $\QTR{Large}{p}$ such that if MATH and MATH

then $\QTR{Large}{s}$ can be written as a concatination of strings MATH such that

  1. MATH forMATH

  2. MATH

  3. MATH

Proof:

Let $\QTR{Large}{G}$ be a grammar in Chomsky Normal Form generating MATH.

By induction on the length of path, Any parse tree generated from $\QTR{Large}{G} $ whose longest path is of length $\QTR{Large}{i}$ can recognize a string of length at most MATH.

In brief:

Since the two possible forms that a production can take are MATH and MATH . A path in a "Chomsky Tree" is determined by a series of "left-right" choices follow by a selection of a terminal. Hence, if the longest path is of length $\QTR{Large}{i}$ the path is uniquely determined by $\QTR{Large}{i-1}$ left-right choices hence there are at most MATH paths:

In more detail:

If $\QTR{Large}{i=1}$ then since $\QTR{Large}{G}$ is in Chomsky Normal Form, the tree has the form MATH , where $\QTR{Large}{a}$ is terminal. So the longest string that it can recognize is of length MATH

Suppone we know the result for MATH and suppose the parse tree $\QTR{Large}{T}$ has longest path has length $\QTR{Large}{i+1}$. Then, again since $\QTR{Large}{G}$ is in Chomsky Normal Form, the top of $\QTR{Large}{T}$ is of the form MATH where $\QTR{Large}{B}$ and $\QTR{Large}{C}$ are non-terminals and MATHand MATHthe parts of $\QTR{Large}{T}$ under $\QTR{Large}{B}$ and $\QTR{Large}{C}$ , have no path of length greater than i. Hence, by hypothesis each subtree, MATHand MATH , can recognize a string of length at most MATH. Thus $\QTR{Large}{T}$ can recognize a string of length at most MATH

Next, assume that $\QTR{Large}{G}$ has $\QTR{Large}{n}$ nonterminals and pick a string $\QTR{Large}{w}$ in MATH such that MATH. From the argument above, any parse tree for $\QTR{Large}{w}$ must contain a path of length greater than $\QTR{Large}{n}$. It follows that at least one nonterminal $\QTR{Large}{B}$ must occur at least twice on this path. In particular, since $\QTR{Large}{G}$ is in Chomsky Normal Form the derivation represented by the tree can be summarized as MATH , where MATH and, $\QTR{Large}{z\ }$are strings of terminals. Moreover, since the selected second occurrence of $\QTR{Large}{B}$ is under $\QTR{Large}{C}$ or $\QTR{Large}{D}$ in the tree, either MATH if it is under $\QTR{Large}{D}$ and MATH if it is under $\QTR{Large}{C}$.

Note that since MATH or MATH are possible production rules, it may be the case that MATH or MATH.

To complete this part of the argument, by substituting MATH for the second occurrence of $\QTR{Large}{B}$ in the original tree.

MATH. is also a parse tree, and, by repeated substitution, we obtain a parse tree for MATH and MATH.

To prove 3. , first note that MATH where $\QTR{Large}{B}$ is the first occurence in the repeating pair. The result now follows from a judicious choice of the pair of repeating non-terminals. In particular, amongst the repeating pairs of non-terminals, choose one whose first occurrence has the shortest path from it to the furthest leaf under it. Of course there may be more than one. The length of path from $\QTR{Large}{B}$ to that leaf must be less than or equal to the number of non-terminals, that is MATH for the second occurrence of $\QTR{Large}{B}$ or else we could find a repeating pair with a shorter defining path. 3. now follows from what is essentially a restatement of the initial arguments of this proof, regarding the length of paths in a tree and the length of strings that can be recognized.

------------------------------------------------------------------------------------------

Example : MATH is not a CFL.

The proof is by contradiction. Suppose MATH . Neither $\QTR{Large}{v}$ nor $\QTR{Large}{y}$ can be a mix of $\QTR{Large}{a}$ s , $\QTR{Large}{b}$ s , and/or $\QTR{Large}{c}$ s . To see this, suppose, for example, MATH, then MATH contains the sequence MATH and hence MATH for any $\QTR{Large}{m}$. Next, suppose for example, MATH and MATH then MATHcannot have the same number of $\QTR{Large}{a}$ s , $\QTR{Large}{b}$ s , and $\QTR{Large}{c}$ s.