As the title implies, Shannon's definition of
Information, below, is focused on communication
(of data), in particular topics such as data compression and error detection
and correction. We will want to understand the "mathematical meaning" of this
statement however some informal understanding of that he was looking at may be
useful.
I want to communicate class readings to you. We agree that 0 represents Raza's text and 1 represents MacKay's. I send you the message
1-87. You know that you need to be looking at Page
87 of MacKay. How did you know that I just did not flip
87 Heads? Answer, both ends of the communication channel
agree on "meaning" or the "content." of messages.
The smallest element of computer storage. For our purposes, a
single binary digit (0 or 1) and , assuming
the Bit's in a giving computer's memory are are randomly distributed, when I
learn the value of a single Bit then I have learned 1 Bit of
Information.
Definition:
We are given a set
and
a Sigma Algebra on
,
and a Probability Measure
Let
be
an event such that
.
We define
We say that we have learned
bit's
of Information when we learned that
has occured.
1. If an event
occurs with probability
then
.
2. If an event
occurs with probability
then
,
we have learned nothing.
3. If an event
occurs with probability
then
?.
Mathematically, if we have events
and
then
.
Definition (Provisional):
Given a set
and
a Sigma Algebra on
,
a Partition of
and
a Probability Measure
with
Define:
Given:
A set
and
a Sigma Algebra on
,
A Finite Random Variable
on
, taking values
and
such that
A Probability Measure
Define The Entropy of
:
We say that we have learned
bits
on Information when we learn the outcome of experiment
.
The reason for the Provisional Definition is that the Entropy has nothing to
do with the actual values
only with the Partition
.
(See the comments in The Paper above ). A more
dramatic example, though maybe less relevent, would be two different
experiments.
A coin is tossed, if it is a Head you win a penny, if it is a Tail I win a
penny.
A coin is tossed, if it is a Head you win $1000, if it is a Tail I win a
penny.
How many bits of Information do I get when I learn the outcome? For 1.? For 2.?
is
a constant function
,
so
and
.
There is no uncertainty as to the outcome of the experiment, we learn nothing
when we learn the result.
takes two values
and
The two possible outcomes, each have probability
:
,
we expect to learn
bit of Information.
The two possible outcomes have probability
and
the other with probability
We expect to learn less than
bit of Information because there is less uncertainty.
Writing the two probabilities as
and
and
Theorem:
( As
the outcome of the experiment becomes more and more certain so, intuitively,
we expect
)
Proof:
Again
so, for the moment writing
for
we need only to look at
which is an indeterminate form of type
.
Rewriting the limit as
we have an indeterminate form of type
Applying L'Hospital's Rule. we have
One has a bag containing a red, blue, and green ball. A ball is picked at
random, how much Information is obtained? Since any ball selection is assumed
equiprobable:
The Problem:
You are given 12 numbered balls that otherwise look identical in all respects except one of them has a different weight. It could be lighter or heavier.
You are given a balance and you can use that balance, and only that balance, to weight any number of ball against any other number.
Find out which is the ball of different weight and whether it is lighter or heavier.
Shannon's Clue:
There are 24 possible, we assume equiprobable, outcomes
,
. which ball is of different weight and is it heavier or lighter, in
particular
.
If we could come up with a way of getting a slightly more than
bits
of information with each weighing experiment we should be able to do it with 3
weighings.
Exercise: Come up with a strategy. (Do not watch MacKay's second lecture for a bit it will spoil the fun).
Assignment( Due February 14): Suppose you had 6 numbered balls as above. What does Shannon say? Find a weighing strategy.
In the setting of the Definition of Entropy:
takes values
and
The two pictures:
takes values
and
takes values
and
takes values
and
The Formula:
The Proof by Addition:
Example:
I flip a coin twice. I tell you the outcome in two stages. First I tell you the outcome of the first flip and then the outcome of the second.
,
and
,
and
,
,
and
,
,
and
Exercise (Due February 14):
I choose a ball at random from a bag containing 3 red balls and 1 blue ball and I choose a card at random from an ordinary deck of playing cards. If I choose a red ball I tell you the cards rank (A,2,3....,K).
If I chose the blue ball I tell you the cards suit (S,H,D,C).
How bits of Information do you expect to get by learning the result of my experiment?
Verify that the Partition Formula produces the correct result for this experiment.
In the previous material on Bernoulli Trials we computed:
Suppose
is
an integer.
is
a maximum since:
and
We have the following:
Since
and
Suppose
and
are two probablility distributions, then
with equality if and only if
for
Proof:
Since
and
proving theTheorem is equivalent to proving
Assume for the moment that all
and
, since
for
, we have
but
only
if
The case where some
or
is fairly straight forward and similiar to previous calculations involving "
".
Corollary: Given
as
above
In particular if
then
Proof:
Letting
and
for all