Context-Free Grammars and Chomsky Normal Form

The Definition of a Context-Free Grammar:

A Context-Free Grammar(CFG) consists of four finite sets MATH where:

  1. $\QTR{Large}{V\ }$is the set of variables.

  2. MATHis the set of terminals.

  3. $\QTR{Large}{R\ }$is the set of production rules of the form MATH where MATHand $\QTR{Large}{s\ }$is a string, possibly empty in MATH

    MATH is usually written MATH ,with MATH denoting the empty string.

  4. MATHand is the start variable.

______________________________________________________________________

Terminology:

Given $\QTR{Large}{a,b\ }$,two strings of variables and terminals and a production rule MATHwe say that the string $\QTR{Large}{avb}$ yields the string $\QTR{Large}{asb}$.

This is denoted as MATH

A sequence of strings of the form MATH written MATH is called a derivation and we say that MATH is derived from MATH MATH

The Language of a CFG MATH is the set of strings of terminals MATH

Since every step in a derivation amounts to replacing a single non-terminal by a string, we can chose a derivation MATH

such that at each step the left-most non-terminal is replaced. This is called a cannonical derivation.MATH

A Context-Free Language(CFL) is the Language of some CFG..MATH

A CFL containing a string $\QTR{Large}{w}$ with more than one cannonical derivation is called ambigious.

_______________________________________________________________________

Examples:

1.

MATH

$\hspace{0.3in}$($\ \QTR{Large}{R\ }$alternate notation )

MATH

MATH

$\hspace{0.3in}$The Language of this CFG is MATH, for example:

MATH

2 (Just the Rules and some notation)

MATH

MATH

MATH

MATH

MATH

MATH

3.

MATH

MATH

MATH

MATH

MATH

MATH

$\hspace{0.3in}$The Language is, algebraic expressions in $\QTR{Large}{x}$ and $\QTR{Large}{y}$, for example:

MATH

The cannonical(left) derivation:

MATH

4. Ambigious

MATH

MATH

MATH

MATH

MATH

Figure

5. The Java Programming Language:

_____________________________________________________________________________

Exercise (Due April 25)

Show

MATH is a CFL....( eg. MATH)

Answer:

MATH

MATH

MATH

MATH

____________________________________________________________________________

What does all of this have to do with Information Theory? - A Brief Introduction.

Rule Probability
MATH $\QTR{Large}{1.0}$
MATH $\QTR{Large}{1.0}$
MATH $\QTR{Large}{0.8}$
MATH $\QTR{Large}{0.2}$
MATH $\QTR{Large}{0.4}$
MATH $\QTR{Large}{0.2}$
MATH $\QTR{Large}{0.4}$
MATH $\QTR{Large}{0.5}$
MATH $\QTR{Large}{0.5}$
MATH $\QTR{Large}{0.4}$
MATH $\QTR{Large}{0.3}$
MATH $\QTR{Large}{0.3}$

Read:

MATHand MATH

but a "standard" notation is:

MATHand MATH

So

MATH



The Problem:

Kids make tasty treats.

____________________________________________________________________________

Modifying a CFG - Removing Unit Rules:

A unit rule is a production rule of the form MATH where MATH

One can generate the same CFL :

The issue is circular derivations, a sequence of rules of the form

MATH.MATH.MATH

For example, if you had the rules MATHand MATHapplying the above bullet would produce a grammar with the rules MATHand MATH. The solution is:

  1. Remove all unit rules of the form MATH

  2. Removing all other unit rules, MATH , and add rules of the form MATHwhere MATHis a rule

  3. Remove all unreachable rules.

  4. Repeat 1. and 2. and 3. until all unit rules are gone.

Example:

MATH

$\hspace{0.3in}$($\ \QTR{Large}{R\ }$alternate notation )

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH

MATH This rule is unreachable from $\QTR{Large}{\ S\ }$ so

MATH

MATH

MATH

MATH

MATH

MATH This rule is unreachable from $\QTR{Large}{\ S\ }$ so

MATHRemoving the rule MATH

MATH

MATHWhich is just Example 1. above.

_____________________________________________________________________________

Chomsky Normal Form:

Definition:

A CFG is said to be in Chomsky Normal Form if every Rule is of one of three forms:

  1. MATH where MATHbut neither $\QTR{Large}{u\ }$nor $\QTR{Large}{w\ }$ can be $\QTR{Large}{S}$

  2. .MATH where MATHand $\QTR{Large}{u\ }$is inMATH

  3. MATH

_____________________________________________________________________________

Theorem:

Context Free Languages are decidable. That is given any MATHthere is an algorithm to determine whether or not MATH

Proof:

_____________________________________________________________________________

Theorem:

Any Context-Free Language can be generated by a CFG in Chomsky Normal Form:MATH

Proof (A sketch):

An Example (1. Above):

Step 1:

MATH

MATH

Step 2:

MATH

MATH

Step 3:

MATH

MATH

Step 4:

MATH

MATH

MATH

MATH

Step 5:

MATH

MATH

MATH

MATH

MATH

MATH