Reviewing, we considered a set
of
plaintext-probability pairs
that is the probability that the symbol
occurs in a message is
.
We also considered an instantaneous binary code
for
.
Finally we set
We were then able to show that
:
That is the expected length of
a coded symbol is bounded below by
.
If I had a message
symbols of
long and I encoded it using
then
is a lower bound for the expected length of the coded message.
The key point in the proof was the observation that instantaneous codes were those where the code strings occur at the leaves of the binary tree representation of the code. Hence we have the formula.
One wants to know how to achieve the lower bound. In fact this is not
a "hard" problem like factoring primes. Brute force is even a viable method
(the number of interior nodes, relative to
,
of the trees we need to search can't get too big, why?). However there is a
rather simple algorithm to general a "best possible" code.
At the root of Huffman's algorithm is a simple idea, which is...choose the longest binary string for the least probable symbol because this would reduce the expected length of a transmission.
That is:
if
and
then
.
Hence
where the first summation is the result of interchanging
and
in the second.
An Example: The idea of the algorithm is to build a tree from the leaves back towards the root. At each stage we "combine" the two least probable symbols, in effect adding a new node to the tree.
a | .4 |
b | .3 |
c | .1 |
d | .1 |
e | .1 |
a | .4 |
b | .3 |
de | .2 |
c | .1 |
a | .4 |
b | .3 |
cde | .3 |
bcde | .6 |
a | .4 |
a | 0 |
b | 10 |
c | 110 |
d | 1110 |
e | 1111 |
______________________________________________________________________________________________________
Suppose we have a cryptosystem.
and an encrypted message
the
big question is can we find the original plaintext message
. Information Theory provide a general framework to study this and related
questions. Unfortunately the practical applications of this framework are
somewhat limited, at least for our purposes. On the other hand, it is
worthwhile to briefly review this area.
The first thing that must be done is to define
"Conditional Entropy." That is given events
and
we want to consider
the amount of uncertainty in
knowing after
is revealed. The formula defining of
is an somewhat obvious extension of
. From the Information Theoretic point of view the
formulation involves the three sets
.
For example, the following definition:
: We say
that a cryptosystem has perfect secrecy
if
In simple terms, the amount of uncertainty about a
message is the same whether or not we know the encrypted message. In order for
this definition to make sense we need to remember that for a message
to be the plaintext of an encrypted message
there needs to be some key
with
Examples:
Questions:
What about SHIFT and SUBSTITUTION Cyphers?
What about RSA?
Consider
,
remember that if I have an encrypted message and the Key that was used then I
can decode the message, is there a relationship between
and
?
In all the about we are more or less assuming that the distribution of plaintext messages are equiprobable. What role does the fact that we are dealing with natural languages play in this?