Definition:
In the setting of the previous lecture, given two jointly distributed Finite
Random Variables
and
their Mutual Information is defined as follows:
There is no minus sign!
If
and
are
Independent
since
There
is no Mutual Information, Example 2 from the previous section.
For a noiseless Channel
Since
and essentially the same calculation for . All Information is Mutual.
Theorem:
Proofs:
These are all simple variants of the definition, the calculation in the second
bullet above and the material in the previous lecture. For example,
One might read
, the Mutual Information as:
the
average information about the character
received after
the transmission noise has been removed.
the
average information about the character
sent after
the Bayesian noise has been removed.
The information in after a copy of the joint information has been removed, the Mutual Information getting counted twice
We have Random Variables and and {A,B,..,Z}
The two special cases to be considered are :
, error free transmission.
and
, All of the information is in what is transmitted.
for all
and
,
total noise
,
,
Since and are independent.there is no mutual information.
Definition:
for
a given
channel ,
the Channel Capacity,
is
defined by the formula
For the example of a Binary Symmetric Channel, since and is constant. The maximum is achieved when is a maximum (see below)
Exercise (Due March 7) : Compute the Channel Capacity for a Binary Symmetric Channel in terms of ?
Theorem:
If the values of each row of a Channel Matrix , , are a permutation of the values in any other row then for any In particular
if a Channel is Symmetric then is independent of the input probability vector
If the values of each column of a Channel Matrix , , are a permutation of the values in any other column then
If a Channel is Symmetric:
The
Channel Capacity
for
any
.
Since for any
Proof:
and but the rows of the channel matrix all have the same values, again the order may be different
so is independent of . In particular,
for any
For a given
,
in
particular since the columns all have the same values
since any and ,and is a probability distribution.