The expression (3.3.13) for the entropy (S) is equivalent to Shannon's (1949) definition of the information contained
Question:
The expression (3.3.13) for the entropy \(S\) is equivalent to Shannon's (1949) definition of the information contained in a message \(I=-\sum_{r} P_{r} \ln \left(P_{r}ight)\), where \(P_{r}\) represents the probability of message \(r\).
(a) Show that information is maximized if the probabilities of all messages are the same. Any other distribution of probabilities reduces the information. In English, " \(\mathrm{e}\) " is more common than " \(\mathrm{z}\) ", so \(P_{\mathrm{e}}>P_{\mathrm{z}}\), so the information per character in an English message is less than the optimal amount possible based on the number of different characters used in an English text.
(b) The information in a text is also affected by correlations between characters in the text. For example, in English, " \(q\) " is always followed by " \(u\) ", so this pair of characters contains the same information as "q" alone. The probability of a character indexed by \(r\) followed immediately by character indexed by \(r^{\prime}\) is \(P_{r, r^{\prime}}=P_{r} P_{r^{\prime}} G_{r, r^{\prime}}\), where \(G_{r, r^{\prime}}\) is the character-pair correlation function. If pairs of characters are uncorrelated, then \(G_{r, r^{\prime}}=1\). Show that if characters are uncorrelated then the information in a two-character message is twice the information of a single-character message and that correlations \(\left(G_{r, r^{\prime}} eq 1ight)\) reduce the information content. [Hint: Use the inequality \(\ln x \leq x-1\).]
(c) Write a computer program to determine the information per character in a text file by determining the single-character probabilities \(P_{r}\) and character-pair correlations \(G_{r, r^{\prime}}\). Computers usually use one full byte per character to store information. Since one byte can store 256 different messages, the potential information per byte is \(\ln 256=8 \ln 2 \equiv 8\) bits. Show that the information per character in your text file is considerably less than 8 bits and explain why it is possible for file-compression algorithms to reduce the size of a computer file without sacrificing any of the information contained in the file.
Step by Step Answer: