Question

1 Approved Answer

Posted on Oct 16, 2024

Can you help me with the static void compress and expand method on the last page. I only need help with those methods and the

Can you help me with the static void compress and expand method on the last page. I only need help with those methods and the rest of document is provided for context. Click on this question to get reference images.

The Huffman Coding Assignment Read this entire document before starting! Begin by defining a class HuffmanTree.iava that represents a Huffman encoding tree. Define the following methods (more methods will be added later]: Page I of 5' Method De sc riptio n This method will construct the Huffman tree using the supplied array of character frequencies, where COUMS [i] is the number of occurrences of the character with integer (decimal) value 1. For example, counts[32] is the number of spaces. HuffmanTree( int counts ) This method should write your encoding tree void Wtitelring fileName} to the given file in a standard format, using the naming conventions provided later. This method will write a compressed binary void encoae{BiLoutput5tream out String le \"Sing the HUman COdeS generated by \"imam\" ' the Huffman tree. It will write the Huffman code for each character. bit by bit. using BitOu gutStream. The array passed to HuffmanTree's constructor should have exactly 256 elements. but your program should not depend on this. Instead. use the length field of the array to know how many there are. You should use a priority queue to build up the tree as described in the how-to article. Create the 'forest' - add a leaf node for each character that has a frequency greater than 0 (we don't include characters not in the input source in ourtree). These should be added in increasing character (decimal) order {character 0, character 1, and so on). You will need to define a binds class. You should decide what data fields are appropriate to include in the node class. Put this class in a separate file, rather than nested inside the HuffmanTree class, for testing. When building the Huffman tree using a java. util.i1"riorityQueuecE>r you'll be adding values of type Node. This means that your node class shoudl implement the Comparablecb interface, the contract for types that are "order-able". It should use the frequency (weight) of the sub-tree to determine its ordering relative to other sub-trees, with lower frequencies considered "less" than higher frequencies. The provided TreePrinter class will print your Huffman tree in a nice way. when debugging. When overriding the toString method of your node class. recall that only the frequencies matter for interior nodes, while the leaf nodes will store the actual characters. As with the Twenty Questions program. we will use a standard format for outputting the Huffman tree. The output should contain a sequence of line pairs. one for each leaf of the tree. The rst line of each pair should have the integer value of the character stored in that leaf (rather than the corresponding character, which may be something like a newline character and therefore not visible). The second line should have the code {a string of 0's and 1's} for the character with this integer value. Page 2 of 7 For the example above (ve letters a, b, c, x, y with frequencies 3, 3, 1, 1, 2, respectively). the le output would be as follows (note: the letter 'a' has integer value 9?, character 'y' is 121): 121 00 99 01E! 120 011 9? 10 98 11 As mentioned in the intro document. Huffman coding works best if one character is designated as "end of le," meaning that every le is guaranteed to end with such a character and it will be used for no other purpose. Some operating systems have such a character, but if we want to write a general-purpose program, we should do something that is not specific to any one operating system. In addition to encoding the actual characters that appear in the file, we will create a code for a ctitious end-of-file character that will be used only by the Huffman encoding and decoding programs. The pseudo-EOF character should be one higher than the value of the highest character in the frequency array passed to the constructor (256 in this case, as ASCII values of 0- 255 are assumed]. It will always have a frequency of one because it appears exactly once at the end of each file to be encoded. You will have to manually add this character to your priority queue because it will not be included as part of the frequency array. The output listed above does not include the pseudo-EOF character. When you include the pseudo-EOF character with a frequency of one, the output becomes: 121 00 255 010 99 @110 120 @111 9? 10 98 Page 3 of F 11 Recall that a Huffman solution is not unique. You can obtain any one of several different equivalent trees depending upon how certain decisions are made. However, if you use Java's Priorityueue class, thus letting it handle the decision of which node to choose when two nodes have the same frequency, the tree should at least be consistent. Recall that a Huffman solution is not unique. You can obtain any one of several different equivalent trees depending upon how certain decisions are made. However, if you use Java's PriorityQueue class, thus letting it handle the decision of which node to choose when two nodes have the same frequency, the tree should at least be consistent. Use the utility class BitOutputStream to write the binary file. Call the writeBit method, supplying one bit at a time. The writeBit method takes in an int (32-bit integer), but only writes the last bit. The writeBit method builds a byte-sized buffer (a 'register' of 8 bits) and writes one byte at a time to the binary file when the buffer is full or when the stream is closed. If the stream is closed before the buffer is filled, it is padded with extra bits and the last byte is written to the file. The pseudo-EOF character allows us to know the end of our data, so we do not attempt to read the extra padded bits when decoding. Encoding and Decoding a File. There are two main parts to this assignment. The first is the HuffmanTree class, used to build a variable bit length Huffman encoding tree. This class is also responsible for producing the "code file", a file that contains the codes used to compress some input source. The second is a class that will handle encoding and decoding input text given a particular Huffman coding tree. To encode/decode from a file, you will first have to first add some new methods to your HuffmanTree class: Method Description This constructor will reconstruct the tree from a file. You can assume that the file contains a tree HuffmanTree (String codeFile) stored in standard format. Note that when reconstructing the tree from the file, the character frequencies no longer matter. This method will decode the stream of bits (0's and void decode (BitInputStream in, String outFile) 1's) supplied and output the corresponding characters to the supplied file, given the naming conventions later in this document. Previously you wrote a constructor that took an array of frequencies and constructed an appropriate tree given those frequencies. This new constructor is passed a file representing the product of your write method from before. For this second part, the frequencies are irrelevant because the tree has already been constructed once; however, you are likely using the same node class as before. You can set all the frequencies to some standard value like 0 or -1 for this part. Follow your heart. Remember that the agreed-upon format for the code file was a series of pairs of lines where the first line has an integer representing the character's decimal value and the second line has the code to use for that character. You might be tempted to call nextInt to read the integer and nextLine to read the code but remember that mixing token-based reading and line-based reading can be problematic. Here is an alternative that uses the Integer. parseInt method, that allows you to use two calls on nextLine: int n = Integer . parseInt (input . nextLine () ) ; String code = input . nextLine () ;For the decoding part, you read a BitInputStream. This utility class that works in conjunction with another class called BitOutputStream. They each have a very simple interface and allow you to write and read compact sequences of individual bits. The only non-constructor method you'll need to use from BitInputStream is the following: public int readBit () / /returns a bit from the file, as an integer The decode method is doing the reverse of the encoding process. It is reading (variable-length) sequences of bits that represent encoded characters and it is figuring out what the original characters must have been. Your method should start at the top of your tree and should read bits from the input stream, going left or right depending upon whether you get a 0 or 1 from the stream. When you hit a leaf node, you know you've found the end of an encoded sequence. At that point, you should write the character to the output file. Once you've written this character, you go back to the top of your tree and start over, reading more bits and descending the tree until you hit a leaf again. At that point you write again, go back to the top of the tree, read more bits and descend until you hit a leaf, then write the leaf, go back to the top of the tree, and so on. Remember that we introduced a pseudo-EOF character with a special value (256, given the standard ASCII alphabet with decimal values 0-255). The encode program will write exactly one occurrence of this character at the end of the file. At some point your decoding method will come across this EOF character. At that point, it should stop decoding. It should not write this integer to the file because it isn't part of the original file. If you fail to recognize the pseudo-EOF character, you might end up accidentally reading past the end of the bit stream. When that happens, the readBit method returns a value of -1. If you see a value of -1 appearing, it's because you've read too far in the bit stream. Be careful of using recursion in the decode method; Java has a limit on stack depth. For a long file, there are hundreds of thousands of characters to decode. That means it would not be appropriate to write code that requires a stack that is thousands of levels deep. Use loops to make sure that you don't get a StackOverflowException, which could cause sadness or frustration or both. You are given some data files for this assignment, including: "happy hip hop.txt" (a text file containing the example string from the how to), "short.txt", and "Hamlet.txt". The files "happy hip hop.txt" and "short.txt" are small files suitable for preliminary testing. The file "Hamlet.txt" contains the full text of Shakespeare's play Hamlet and should be used to test your compression when you believe your program is working (compression works better on larger data sets). Feel free to make more as you see fit. The table below describes the naming conventions to use for the files involved in this assignment: Extension Example Description txt Hamlet . txt Original text file code Hamlet . code List of character codes to use . short Hamlet . short Compressed file (binary, not human readable) Should take up less disk space than the .txt file . new Hamlet . new Decompressed file (should match the original) Page 5 of 7It can be challenging to debug the decoding part of the assignment because the encoded files are not readable in a normal text editor and because the character boundaries are not obvious. If you absolutely MUST view your encoded files, you will have to use a hex editor. However, it is suggested that you log things to the console for debugging, rather than browsing the binaries of your encoded files. Next, define a class HuffmanCompressor.java that will function as the client of your HuffmanTree class. Add the following methods: Method Description This method should read fileName and generate a character frequency table (array), used to construct a Huffman tree. static void compress (String filename) It should then write the encoding tree to file, and generate the compressed file, using the naming conventions seen previously. This method should re-build the tree from the static void expand (String codeFile, String fileName) supplied code file. It should then write the decoded (expanded) output to the supplied file. Remember that when generating the frequency table, you can assume that all characters in the file will be part of the standard 256-character ASCII alphabet. The compress method should use the BitOutputStream helper class to write individual bits to file; it has just one public bit-writing method, void write (int b), which will write the integer provided as a bit to disk. The expand method should use HuffmanTree's decode method to parse the stream of bits in the compressed file, and write the decoded text to the provided file. The expanded file should exactly match the original