[Solved] In C++ Huffman Codes Description Suppose

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

In C++ Huffman Codes Description Suppose that we have to store a sequence of symbols (a file) efficiently, namely we want to minimize the amount

In C++

image text in transcribed

Huffman Codes Description Suppose that we have to store a sequence of symbols (a file) efficiently, namely we want to minimize the amount of memory needed. For the sake of simplicity we assume that the symbols are restricted to the first 6 letters of the alphabet. For example, let us assume that the frequency of disferent symbols that you have to store are the following. symbol frequency 1000 150 300 Total As we have to store 6 different symbols, the obvius way is to enode each of the in 3 bits, as with 3 bits it is possible to encode2 different symos With this encoding, we need 2500 x 3 7500 bits to store the above symbols. A disferent way to address the problem is the following. Instead of assigning to each symbol a code with the same length (i.e. number of bits), we assign shorter codes to symbols that are more frequent, and longer codes to symbols that are les frequent. One possible encoding according to this sequence is the following symbol encoding 0 10101 1011 100 0100 According to this encoding the number of required bits is: 1000 x 1150 x 5+200 x 4+800 x 2+300 x 3+50 x 5-5300 This idea i t the basis of the programs used to compress files. Firt they analyze the input, then they choose the codes, and then they recode the input according to the determined codes. While this idea brings benefits in terms of the space requirements, using variable length codes presents soe problems. Once we have coded a file according to a variable length code, we must also be able to decode in the original format (ie, once we have compressed the file, we want to able to decompres it). The encoding works only if the codes assigned to disferent characters are such that no code is a prefix of any other code. I this propety does not hold, there is a problem of ambiguity when trying to decompress the sequence. You can prove that in the depicted example no code is a prefix of any other code. For example: no code starts with 0 except from the code of A. So while decompressing the file, if we find a symbol whose code starts with 0, we know it's A. If we find a character whose code starts with 11, we know it's D. It can't be any other symbol, as no code starts with other than D's code. And so on. Hor do we assign codes? This is done through a greedy algorithm. We assign the shortest code to the most frequent character, the second longest one to the second most frequent character, and so on. The figure below illustrates the first few stages of the algorithm