Shannon's Definition of Information

The Paper: A Mathematical Theory of Communication:

As the title implies, Shannon's definition of Information, below, is focused on communication (of data), in particular topics such as data compression and error detection and correction. We will want to understand the "mathematical meaning" of this statement however some informal understanding of that he was looking at may be useful.

I want to communicate class readings to you. We agree that 0 represents Raza's text and 1 represents MacKay's. I send you the message

1-87. You know that you need to be looking at Page 87 of MacKay. How did you know that I just did not flip 87 Heads? Answer, both ends of the communication channel agree on "meaning" or the "content." of messages.

The Bit (BInary digiT) :

The smallest element of computer storage. For our purposes, a single binary digit (0 or 1) and , assuming the Bit's in a giving computer's memory are are randomly distributed, when I learn the value of a single Bit then I have learned 1 Bit of Information.

______________________________________________________________________________________________________

Information:

Definition:

We are given a set $\QTR{Large}{S}$ and $\ \QTR{Large}{A},$ a Sigma Algebra on $\QTR{Large}{S}$ , and a Probability Measure

Let be an event such that . We define

We say that we have learned $\QTR{Large}{I(E})$ bit's of Information when we learned that $\QTR{Large}{E}$ has occured.

Some Observations:

1. If an event $\QTR{Large}{E}$ occurs with probability then $\QTR{Large}{=1}$ .

2. If an event $\QTR{Large}{E}$ occurs with probability then $\QTR{Large}{=0}$ , we have learned nothing.

3. If an event $\QTR{Large}{E}$ occurs with probability then $\QTR{Large}{=}$ ?.

Mathematically, if we have events and then .

_____________________________________________________________________________________________

Entropy (Uncertainty, Surprise):

Definition (Provisional):

Given a set $\QTR{Large}{S}$ and $\ \QTR{Large}{A},$ a Sigma Algebra on $\QTR{Large}{S}$ , a Partition of $\QTR{Large}{S,\ }$ and a Probability Measure with

Define:

MATH

Definition:

Given:

A set $\QTR{Large}{S}$ and a Sigma Algebra on $\QTR{Large}{S}$ ,
A Finite Random Variable $\QTR{Large}{X}$ on $\QTR{Large}{S}$ , taking values and such that
A Probability Measure

Define The Entropy of $\QTR{Large}{X}$ :

MATH

We say that we have learned bits on Information when we learn the outcome of experiment $\QTR{Large}{X}$ .

_________________________________________________________________

An Observation:

The reason for the Provisional Definition is that the Entropy has nothing to do with the actual values only with the Partition .

(See the comments in The Paper above ). A more dramatic example, though maybe less relevent, would be two different experiments.

A coin is tossed, if it is a Head you win a penny, if it is a Tail I win a penny.
A coin is tossed, if it is a Head you win $1000, if it is a Tail I win a penny.

How many bits of Information do I get when I learn the outcome? For 1.? For 2.?

________________________________________________________________

The Zeroth Example:

$\QTR{Large}{X\ }$ is a constant function , so and . There is no uncertainty as to the outcome of the experiment, we learn nothing when we learn the result.

The First Interesting Example:

$\QTR{Large}{X\ }$ takes two values and

The two possible outcomes, each have probability $\QTR{Large}{1/2}$ :

, we expect to learn $\QTR{Large}{1}$ bit of Information.

The two possible outcomes have probability $\QTR{Large}{1/4}$ and the other with probability

We expect to learn less than $\QTR{Large}{1}$ bit of Information because there is less uncertainty.

Writing the two probabilities as $\QTR{Large}{p}$ and $\QTR{Large}{1-p}$ and

Theorem:

( As the outcome of the experiment becomes more and more certain so, intuitively, we expect )

Proof:

Again so, for the moment writing $\QTR{Large}{q}$ for $\QTR{Large}{1-p}$ we need only to look at which is an indeterminate form of type .

Rewriting the limit as we have an indeterminate form of type

Applying L'Hospital's Rule. we have

The Second Interesting Example:

One has a bag containing a red, blue, and green ball. A ball is picked at random, how much Information is obtained? Since any ball selection is assumed equiprobable:

______________________________________________________________________________________________

The Ball Weighing Problem:

The Problem:

You are given 12 numbered balls that otherwise look identical in all respects except one of them has a different weight. It could be lighter or heavier.

You are given a balance and you can use that balance, and only that balance, to weight any number of ball against any other number.

Find out which is the ball of different weight and whether it is lighter or heavier.

Shannon's Clue:

There are 24 possible, we assume equiprobable, outcomes , . which ball is of different weight and is it heavier or lighter, in particular .

If we could come up with a way of getting a slightly more than $\QTR{Large}{1.5\ }$ bits of information with each weighing experiment we should be able to do it with 3 weighings.

Exercise: Come up with a strategy. (Do not watch MacKay's second lecture for a bit it will spoil the fun).

Assignment( Due February 14): Suppose you had 6 numbered balls as above. What does Shannon say? Find a weighing strategy.

______________________________________________________________________________________________

The Partition Property (A Special Case):

In the setting of the Definition of Entropy:

$\QTR{Large}{X}$ takes values and

The two pictures:

$\QTR{Large}{Y}$ takes values and
takes values and
takes values and

The Formula:

The Proof by Addition:

MATH

Example:

I flip a coin twice. I tell you the outcome in two stages. First I tell you the outcome of the first flip and then the outcome of the second.

,and

,, and

Exercise (Due February 14):

I choose a ball at random from a bag containing 3 red balls and 1 blue ball and I choose a card at random from an ordinary deck of playing cards. If I choose a red ball I tell you the cards rank (A,2,3....,K).

If I chose the blue ball I tell you the cards suit (S,H,D,C).

How bits of Information do you expect to get by learning the result of my experiment?
Verify that the Partition Formula produces the correct result for this experiment.