Shannon's Definition of Information

The Paper: A Mathematical Theory of Communication:

As the title implies, Shannon's definition of Information, below, is focused on communication (of data), in particular topics such as data compression and error detection and correction. We will want to understand the "mathematical meaning" of this statement however some informal understanding of that he was looking at may be useful.

I want to communicate class readings to you. We agree that 0 represents Raza's text and 1 represents MacKay's. I send you the message

1-87. You know that you need to be looking at Page 87 of MacKay. How did you know that I just did not flip 87 Heads? Answer, both ends of the communication channel agree on "meaning" or the "content." of messages.


The Bit (BInary digiT) :

The smallest element of computer storage. For our purposes, a single binary digit (0 or 1) and , assuming the Bit's in a giving computer's memory are are randomly distributed, when I learn the value of a single Bit then I have learned 1 Bit of Information.

______________________________________________________________________________________________________

Information:

Definition:

We are given a set $\QTR{Large}{S}$ and $\ \QTR{Large}{A},$ a Sigma Algebra on $\QTR{Large}{S}$, and a Probability Measure MATH


Let MATH be an event such that MATH . We define

MATH MATH

We say that we have learned $\QTR{Large}{I(E})$ bit's of Information when we learned that $\QTR{Large}{E}$ has occured.



Some Observations:

1. If an event $\QTR{Large}{E}$ occurs with probability MATH then MATH $\QTR{Large}{=1}$.

2. If an event $\QTR{Large}{E}$ occurs with probability $1$ then MATH $\QTR{Large}{=0}$, we have learned nothing.

3. If an event $\QTR{Large}{E}$ occurs with probability $0$ then MATH $\QTR{Large}{=}$ ?.

Mathematically, if we have events MATH and MATH then MATH.

_____________________________________________________________________________________________

Entropy (Uncertainty, Surprise):

Definition (Provisional):

Given a set $\QTR{Large}{S}$ and $\ \QTR{Large}{A},$ a Sigma Algebra on $\QTR{Large}{S}$, MATH a Partition of $\QTR{Large}{S,\ }$ and a Probability Measure MATH with MATH

Define:

MATH

Definition:

Given:

Define The Entropy of $\QTR{Large}{X}$:

MATH

We say that we have learned MATHbits on Information when we learn the outcome of experiment $\QTR{Large}{X}$ .

_________________________________________________________________

An Observation:

The reason for the Provisional Definition is that the Entropy has nothing to do with the actual values MATH only with the Partition MATH.

(See the comments in The Paper above ). A more dramatic example, though maybe less relevent, would be two different experiments.

  1. A coin is tossed, if it is a Head you win a penny, if it is a Tail I win a penny.

  2. A coin is tossed, if it is a Head you win $1000, if it is a Tail I win a penny.

How many bits of Information do I get when I learn the outcome? For 1.? For 2.?

________________________________________________________________

The Zeroth Example:

$\QTR{Large}{X\ }$is a constant function MATH, so MATH and MATH. There is no uncertainty as to the outcome of the experiment, we learn nothing when we learn the result.

The First Interesting Example:

$\QTR{Large}{X\ }$ takes two values MATH and

The two possible outcomes, each have probability $\QTR{Large}{1/2}$:

MATH, we expect to learn $\QTR{Large}{1}$ bit of Information.

The two possible outcomes have probability $\QTR{Large}{1/4}$ and the other with probability MATH

MATH

We expect to learn less than $\QTR{Large}{1}$ bit of Information because there is less uncertainty.

Writing the two probabilities as $\QTR{Large}{p}$ and $\QTR{Large}{1-p}$ and MATH

Theorem:

MATH

( As MATH the outcome of the experiment becomes more and more certain so, intuitively, we expect MATH)



Proof:

Again MATH so, for the moment writing $\QTR{Large}{q}$ for $\QTR{Large}{1-p}$ we need only to look at MATH which is an indeterminate form of type MATH.

Rewriting the limit as MATH we have an indeterminate form of type MATH

Applying L'Hospital's Rule. we have

MATH

The Second Interesting Example:

One has a bag containing a red, blue, and green ball. A ball is picked at random, how much Information is obtained? Since any ball selection is assumed equiprobable:

MATH

______________________________________________________________________________________________

The Ball Weighing Problem:

The Problem:

You are given 12 numbered balls that otherwise look identical in all respects except one of them has a different weight. It could be lighter or heavier.

You are given a balance and you can use that balance, and only that balance, to weight any number of ball against any other number.


balance.jpg

Find out which is the ball of different weight and whether it is lighter or heavier.

Shannon's Clue:

There are 24 possible, we assume equiprobable, outcomes MATH, . which ball is of different weight and is it heavier or lighter, in particular MATH.

If we could come up with a way of getting a slightly more than $\QTR{Large}{1.5\ }$bits of information with each weighing experiment we should be able to do it with 3 weighings.

Exercise: Come up with a strategy. (Do not watch MacKay's second lecture for a bit it will spoil the fun).

Assignment( Due February 14): Suppose you had 6 numbered balls as above. What does Shannon say? Find a weighing strategy.

______________________________________________________________________________________________

The Partition Property (A Special Case):

In the setting of the Definition of Entropy:

The two pictures:

Figure

The Formula:

MATH

The Proof by Addition:

MATH

MATH

MATH

MATH

MATH

MATH

Example:

I flip a coin twice. I tell you the outcome in two stages. First I tell you the outcome of the first flip and then the outcome of the second.

MATH MATH,MATHand MATH

MATH,MATHand MATH

MATH,MATH, MATHand MATH

MATH,MATH, MATHand MATH

MATH

Exercise (Due February 14):

I choose a ball at random from a bag containing 3 red balls and 1 blue ball and I choose a card at random from an ordinary deck of playing cards. If I choose a red ball I tell you the cards rank (A,2,3....,K).

If I chose the blue ball I tell you the cards suit (S,H,D,C).

__________________________________________________________________________________

Bernoulli Trials:

In the previous material on Bernoulli Trials we computed:

MATH

Suppose MATHis an integer. MATHis a maximum since:

MATHand MATH

We have the following:

MATH MATH MATH

Since MATH and MATH




MATH MATH MATH

__________________________________________________________________________________


Theorem (Gibbs inequality):

Suppose MATH and MATH are two probablility distributions, then


MATH

with equality if and only if MATH for MATH

Proof:

Since MATH and MATH proving theTheorem is equivalent to proving


MATH

Assume for the moment that all MATH and MATH , since MATH for $\QTR{Large}{0<x}$ , we have


MATH

MATH

but MATH only if MATH

The case where some MATHor MATH is fairly straight forward and similiar to previous calculations involving " MATH ".

________________________________________________________________


Corollary: Given MATH as above


MATH

In particular if $\QTR{Large}{n=2}$ $\QTR{Large}{\ }$then MATH

Proof:

Letting MATH MATHand MATH for all $\QTR{Large}{i}$

MATH