A Context-Free Grammar(CFG) consists of four finite sets where:
is the set of variables.
is the set of terminals.
is the set of production rules of the form where and is a string, possibly empty in
is usually written ,with denoting the empty string.
and
is the start variable.
Terminology:
Given ,two strings of variables and terminals and a production rule we say that the string yields the string .
This is denoted as
A sequence of strings of the form written is called a derivation and we say that is derived from
The Language of a CFG is the set of strings of terminals
Since every step in a derivation amounts to replacing a single non-terminal by a string, we can chose a derivation
such that at each step the left-most non-terminal is replaced. This is called a cannonical derivation.
A Context-Free Language(CFL) is the Language of some CFG..
A CFL containing a string with more than one cannonical derivation is called ambigious.
1.
(alternate notation )
The Language of this CFG is , for example:
2 (Just the Rules and some notation)
3.
The Language is, algebraic expressions in and , for example:
The cannonical(left) derivation:
4. Ambigious
5. The Java Programming Language:
Exercise (Due April 25)
Show
is a CFL....( eg. )
Answer:
What does all of this have to do with Information Theory? - A Brief Introduction.
Rule | Probability |
Read:
and
but a "standard" notation is:
and
So
The Problem:
Kids make tasty treats.
Modifying a CFG - Removing Unit Rules:
A unit rule is a production rule of the form where
One can generate the same CFL :
By removing all unit rules, , and adding rules of the form where is a rule
The issue is circular derivations, a sequence of rules of the form
..
For example, if you had the rules and applying the above bullet would produce a grammar with the rules and . The solution is:
Remove all unit rules of the form
Removing all other unit rules, , and add rules of the form where is a rule
Remove all unreachable rules.
Repeat 1. and 2. and 3. until all unit rules are gone.
Example:
(alternate notation )
This rule is unreachable from so
This rule is unreachable from so
Removing the rule
Which is just Example 1. above.
Definition:
A CFG is said to be in Chomsky Normal Form if every Rule is of one of three forms:
where but neither nor can be
. where and is in
_____________________________________________________________________________
Theorem:
Context Free Languages are decidable. That is given any there is an algorithm to determine whether or not
Proof:
Choose a CFG for the Language in Chomsky Normal Form.
Show that for any there is only a finite number of derivations with
Moreover
the number of steps in each of these derivations is less than or equal to
.
Devise an algorithm to enumerate derivations in order of increasing number
of.steps.
Check whether any derivations of step count is a derivation of .
Theorem:
Any Context-Free Language can be generated by a CFG in Chomsky Normal Form:
Proof (A sketch):
Starting with a CFG that generates the language, add a new start symbol and a new Rule this does not change the CFL and takes care of the last part of 1. We now refer to as the start variable.
Remove any Rule of the form , and for any Rule of the form add the rule . You might have to iterate this process. You could have
If you have a Rule of the form add the rule , again iterate if necessary.
Remove all unit rules of the form with and add rules of the form where is a rule. Again, you might have to iterate this process.
Replace rules of the form where and by sequences of rules ,,...., , where the are new variables.
Finally any rule of the form with and, say, gets replaced by two rules and where is new variable. similarly.
An Example (1. Above):
Step 1:
Step 2:
Step 3:
Step 4:
Step 5: