Let
is a context-free language. There is a number
such that if
and
then
can be written as a concatination of strings
such that
for
Proof:
Let
be a grammar in Chomsky Normal Form generating
.
By induction on the length of path, Any parse tree generated from
whose longest path is of length
can recognize a string of length at most
.
In brief:
Since the two possible forms that a production can take are
and
. A path in a "Chomsky Tree" is determined by a series of "left-right" choices
follow by a selection of a terminal. Hence, if the longest path is of length
the path is uniquely determined by
left-right choices hence there are at
most
paths:
In more detail:
If
then since
is in Chomsky Normal Form, the tree has the form
, where
is terminal. So the longest string that it can recognize is of length
Suppone we know the result for
and suppose the parse tree
has longest path has length
.
Then, again since
is in Chomsky Normal Form, the top of
is of the form
where
and
are non-terminals and
and
the
parts of
under
and
, have no path of length greater than i. Hence, by hypothesis each subtree,
and
,
can recognize a string of length at most
.
Thus
can recognize a string of length at most
Next, assume that
has
nonterminals and pick a string
in
such that
.
From the argument above, any parse tree for
must contain a path of length greater than
.
It follows that at least one nonterminal
must occur at least twice on this path. In particular, since
is in Chomsky Normal Form the derivation represented by the tree can be
summarized as
, where
and,
are
strings of terminals. Moreover, since the selected second occurrence of
is under
or
in the tree, either
if
it is under
and
if it is under
.
Note that since
or
are possible production rules, it may be the case that
or
.
To complete this part of the argument, by substituting
for
the second occurrence of
in the original tree.
.
is also a parse tree, and, by repeated substitution, we obtain a parse tree
for
and
.
To prove 3. , first note that
where
is the first occurence in the repeating pair. The result now follows from a
judicious choice of the pair of repeating non-terminals. In particular,
amongst the repeating pairs of non-terminals, choose one whose first
occurrence has the shortest path from it to the furthest leaf under it. Of
course there may be more than one. The length of path from
to that leaf must be less than or equal to the number of non-terminals, that
is
for the second occurrence of
or else we could find a repeating pair with a shorter defining path.
3. now follows from what is essentially a restatement
of the initial arguments of this proof, regarding the length of paths in a
tree and the length of strings that can be recognized.
Example :
is not a CFL.
The proof is by contradiction. Suppose
. Neither
nor
can be a mix of
s
,
s , and/or
s
. To see this, suppose, for example,
,
then
contains the sequence
and hence
for any
.
Next, suppose for example,
and
then
cannot
have the same number of
s
,
s , and
s.