Joint Entropy and Conditional Entropy

Random Variables Revisited:

Definition: Two Random Variables $\QTR{Large}{X}$ $\QTR{Large}{:}$ and $\QTR{Large}{Y:}$ for the same Sample Space $\QTR{Large}{S\ }$ and Sigma Algebra $\QTR{Large}{A\ }$ on $\QTR{Large}{S\ }$ are called jointly distributed.

Notes:

We assume we are given , a Sample Space $\QTR{Large}{S\ }$ , Sigma Algebra $\QTR{Large}{A\ }$ on $\QTR{Large}{S}$ , and Probability Measure $\QTR{Large}{P}$ on $\QTR{Large}{A}$ .

We will simplify the notation a bit by using the notation for , $\QTR{Large}{x\ }\$ for ,and $\QTR{Large}{y}\$ for

However we need to be careful with this notation. In particular, would be the simplified notation for

See the definition immediately below.

___________________________________________________________________________________

Definitions:

The Joint Entropy of a pair of Finite Random Variables $\QTR{Large}{X\ }$ and $\QTR{Large}{Y\ }$ on $\QTR{Large}{S}$ is defined as:

Note that.
For each $\QTR{Large}{y\in }$ , we have the Conditional Entropy of $\QTR{Large}{X}$ given $\QTR{Large}{y}$ ,
Finally The Conditional Entropy of a pair of Finite Random Variables $\QTR{Large}{X\ }$ and $\QTR{Large}{Y\ }$ is defined as follows:
Reversing the roles of $\QTR{Large}{X\ }$ and $\QTR{Large}{Y\ }$ for each $\QTR{Large}{x\in }$ we have:

and

For Read: Having learned the value $\QTR{Large}{Y}\$ has take is the Information you get when you learn the value $\QTR{Large}{X}\$ has taken

is the Expected Value of

___________________________________________________________________

Theorem:

with equality if and only $\QTR{Large}{X\ }$ and $\QTR{Large}{Y}\$ are independent, that is for all .

Proof:

By Gibb's inequality, with playing the role of the 's for the pairs

With equality if and only if and thus

and,

since,

MATH

similarly for

_________________________________________________________________________________

$\vspace{1pt}$ Theorem (The Chain Rule):

equivalently

The Information you receive when you learn $\QTR{Large}{Y}$ plus the Information you receive when you learn about $\QTR{Large}{X}$ given you know $\QTR{Large}{Y\ }$

equals the Information you receive what you learn when you learn about both $\QTR{Large}{X}$ and $\QTR{Large}{Y.}$
The Information you receive when you learn about both $\QTR{Large}{X}$ and $\QTR{Large}{Y.}$ minus the Information you receive when you learn about $\QTR{Large}{Y}$ $\QTR{Large}{\ }$

equals the Information you receive when you learn about about $\QTR{Large}{X}$ given you know $\QTR{Large}{Y\ }$

Proof (the second version):

MATH

_____________________________________________________________________________________

The Extreme Cases:

and , $\QTR{Large}{i=\ j}$

$\QTR{Large}{=\ 0}$ ,

No Noise,

and thus

We learn nothing new when we know what character was received given that we know what was transmitted.

since
and all $\QTR{Large}{m.}$

All Noise,

and

$\$ since the Random Variables are independent.

___________________________________________________________________________

Exercise ( Due March 5): Compute and for a Binary Symmetric Channel and input vector