Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Java Your final major program for CSII will be to create a pair of command-line programs to compress and decompress arbitrary les. This will require

Java

Your final major program for CSII will be to create a pair of command-line programs to compress and decompress arbitrary les. This will require some bit-level operations, creating a Comparable binary tree class that will be used to create bit encodings, and using various Collections classes for counting, sorting and mapping. Hu man codes have been around since the early fties, and are still used for data compression. Modern compression methods use Hu man codes as part of the algorithm, and then layer other compression techniques on top. Hu man codes are general-purpose { they can be used to compress any arbitrary byte stream { though they are generally less ecient then special-purpose approaches that apply only to speci c kinds of data. The basic idea behind Hu man codes is to represent bytes as variable-length bit strings, rather than having every byte be represented by eight bits. If commonly-occurring bytes require only two or three bits instead of eight, while very uncommon bytes are represented with ten or fteen, it adds up to a win in le size. Coding and decoding Hu man codes is based on building a binary tree structure known as a Hu man tree. The leaves of the tree represent the bytes of the original le. The path through a tree from leaf to root (for encoding) or from root to leaf (for decoding) yields a sequence of bits that code for that particular byte. The tree is built in such a way that common bytes are located near the root, while rare ones are located far down the tree. The easiest way to illustrate the process is with an example. 3 Suppose the byte sequence we wish to encode is [-35, 41, -35, 41, 116, -35, 22, -35, 116] (bytes in Java are always signed, so they range in value from -128 to +127). The rst step is to count the number of occurrences of each byte value: [-35:4, 22:1, 41:2, 116:2]. Now create Hu man leaves for each byte, putting them into a priority queue sorted according to their frequency count. Then repeatedly pull pairs of nodes out of the priority queue and connect them via a new parent node. This node's value should be the total of the values of its children. Put the node back into the priority queue, and continue the process (illustrated in Figure 1) until you have a single node. You now have a Hu man tree built according to byte frequency. To encode a byte, start at the leaf of the tree represented by that byte and traverse up the tree until you reach the root. Every branching where you came from the left is represented by a '0', while every branching where you came from the right is a '1' (make sure you build the codes up in the correct order, from right to left). Thus the bit strings for our example are: [-35:0, 116:10, 22:110, 41:111]. Notice how common values are represented with short strings and uncommon ones with long, just as we intended. The bit sequence of our original byte string (9 bytes of 8 bits each) was 72 bits long; our encoded string is 17 bits: 01110111100110010. Decoding a string of bits is simply reversing the process. Starting from the root of the tree and taking each bit in turn from the bit string, follow the corresponding branch through the tree until you end at a leaf. The byte represented by that leaf is the one you want. In our example, the rst bit of the code is 0. We take the left branch from the root and end up at the -35 leaf. There's our rst byte. The next bit is a 1; we take the right branch at the root and we are not yet at a leaf. The next bit is 1; we take the right branch again. Still no leaf. The next bit is 1 again, and now we've arrived at the 41 leaf, our second byte. Starting from the root again, the next bit is 0, which leads us to -35, our third byte. And so on. The whole idea for the encoder, then, is to read in a le, construct a Hu man tree based on the frequency of bytes in a le, run through each byte in the le and gure out its Hu man bit sequence, concatenate all the bits together, and save the tree and the bit sequence to disk in a new, smaller le. Decoding is basically the reverse: read in the Hu man tree and bit sequence, then use the tree to decode the sequence into an array of bytes. Save the bytes to disk and you have reconstructed the original le. One more thing to worry about: how to go from a sequence of bits to a byte, and vice versa? We will have a lecture about bit twiddling and bitwise logical operations. Individual bits can be manipulated and extracted by ANDing and ORing with a single bit in a particular position. I suggest that you take a look at this code and gure out how it works, but in any case, I'm providing it for your use (Figure 2). Notice that we're using integers to represent bits { 32 times more space than we need! But it's the easiest alternative. The rst step is to read in all of the bytes of a le (speci ed as a command line argument), using a FileInputStream. You can call the stream's available() method to nd out how big your byte array must be, and then simply call the read method with your byte array as an argument. Now construct your Hu man leaf nodes. This class should contain a pointer to a parent and two children (labeled \zero" and \one"). It should also have a variable to represent the node's frequency count, and one to represent a particular byte. Leaf nodes will have null children and a lled-in byte variable, internal nodes will have null byte variables, and the root of the tree will have a null parent. Remember that this class must implement Comparable, so that it can be used inside a PriorityQueue. I suggest creating 256 Hu man leaves, one set to each possible byte, and with initial counts of zero. These can be stored in a HashMap, keyed by bytes. 4 Figure 1: The process of building a Hu man tree. 5 import java.util.*; public class Twiddle { public static byte[] bitsToBytes(List l) { byte[] toReturn = new byte[l.size() / 8]; int index = 0; for (int i = 0; i < l.size(); i += 8) { for (int j = 0; j < 8; j++) { toReturn[index] = (byte)(toReturn[index] | (l.get(i+j) << (7-j))); } index ++; } return toReturn; } public static List bytesToBits(byte[] b) { ArrayList toReturn = new ArrayList(); for (int i = 0; i < b.length; i++) { for (int j = 7; j >= 0; j--) { toReturn.add((b[i] & (1 << j)) >> j); } } return toReturn; } } Figure 2: The Twiddle class. 6 Proceed byte by byte through your le data and increment the count of the corresponding Hu man leaf each time you encounter a particular byte. After you have done so, each Hu man leaf knows its frequency within the le. Add all of the leaves to a PriorityQueue (a slight optimization is to add only those leaves with a count greater than zero). Construct the tree by removing two nodes from the PriorityQueue, combining them with a new parent node, and adding the parent node back into the queue, as illustrated in Figure 1. Once only one node is left, you have the fully-built Hu man tree. For each byte in your le in turn, retrieve the corresponding Hu man leaf from the HashMap, and obtain the bit string by traversing up the tree. I suggest writing a recursive method within your Hu man node class to do so. These bit sequences, when attached one after the other in a list, can then be turned back into an array of bytes. Warning: Think about what happens if your bit string is not evenly divisible by 8. You will need to pad your bit string so that it is. You will also have to devise a mechanism so that this bit padding does not confuse the decoder. Now you can write your compressed le to disk using an ObjectOutputStream. Write your Hu man tree, any other information you need (like perhaps the length of the le), and then the byte array. Call it the same name as the original source le, but with the sux \.hu " attached. Decompression works much the same, in reverse. Create an ObjectInputStream, read in the Hu man tree and any additional information you saved, then the array of bytes. You can use the available() method (after you've read in the objects) to see how large the array needs to be, and the readFully() method to get a hold of the bytes. Turn the bytes into a bit stream, and then use the Hu man tree, starting from the root, to gure out which byte is represented by the bit sequence. Again, a recursive decoding method is recommended. Create a byte array to hold each new byte as you decode it, until you have decoded the entire bit stream (and handled the padding appropriately). Write this le to disk.

to the expert I dont know what you want me to do...sorry!

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Automating Access Databases With Macros

Authors: Fish Davis

1st Edition

1797816349, 978-1797816340

More Books

Students also viewed these Databases questions

Question

1.The difference between climate and weather?

Answered: 1 week ago

Question

1. What is Fog ?

Answered: 1 week ago

Question

How water vapour forms ?

Answered: 1 week ago

Question

What is Entrepreneur?

Answered: 1 week ago

Question

Which period is known as the chalolithic age ?

Answered: 1 week ago

Question

Why do HCMSs exist? Do they change over time?

Answered: 1 week ago