Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A bitstring bs just a sequence of bits, e.g. 10010110. Normally, pirintable characters in a file are encoded as bytes, or bitstrings of length 8.

image text in transcribed
image text in transcribed
image text in transcribed
image text in transcribed
image text in transcribed
A bitstring bs just a sequence of bits, e.g. 10010110. Normally, pirintable characters in a file are encoded as bytes, or bitstrings of length 8. This is called ASCII oode. For example, the ASCII code for the character ' a ' is the bitstring 01100001 , which is 97 in binary. erent lengths, then we can asign shorter strings to the more frequest charseters, thereby saving space. To illustrate, let 's do this for bitstrings of length 2. Suppose we have a file that is exactly 2000 bits long. so it can be viewed as a sequence of 1000 2-bit bitstringe. Suppose we sean the file and find that the four 2-bit bitstrings 00,01,10,11 occur in the file with the frequencis as shown in Fig. 1. Figure 1 Frequencies of 2 -bit Figure 2 A self-delimiting code bitstrings in a file A good encoding for this set of freqquencies would be the variable-lragth code shown in Fig. 2. This particular escoding has the nice property that it is self-delimiting which means that no codeword is a prefix of any other codeword. This ptoperty alloms tis to place the codewords end-to-end in a tile without having to mark their endpoints, and still be able to reconer the original text (we Section 3.3 below). For example, if the file begins with 0000001010110100000001 then it would be encoded as 0001101101110000010 (We have iuserted spaces just to show the original 2-bit bitstrings and the codeworib, but the spaces are not really there in the file.) Given the correspondence of Fig. 2, the two sequenees determine each other iniquely. How much space have we saved? The original file was 3000 bits lang. The length of the encoded file is 900+1+90+2+93+3=1110 bits. We have compressed the file by almost half. Tree Representation of Self-Delimiting Codes A coavenient representation of a self-delimiting code is a complnte binary tree with the original tmeoumpresoed bitstrings at the leaves. The path from the root down to the kaf containing x, represented as a bitstring (left =0, right =1 ) is the codeword for x. For example, the code of Fig. 2 would be represented by the tree in Fig. 3. The tree in Fig. 4 shows the identity code, in which each word is represented as itwelf. This coele doesn't achieve any compression. Huffman Trees A H uffman tree is a tree built from known frequencies for bitstrings of some fixed cotstant length k. It tries to code frequently occurring bitstrings of length k as short bitstrings and infrequently occurring bitstrings of length k as long bitstrings. The tree of Fig. 3 is a Hulfman tree for k=2 cotstructed from the frequencies of Fig. 1. In this assignment, you will implement. Hulfman trees for k=8 (bytes). To build a Huffinan tree, suppose we know the number of occurrences of each k-bit bitstring in the file. We make a separate tree node for each k-bit bitstring containing that bitstring as the kry along with its frequency; these will be the keaves of our Huffman tree when we are done. So we start with a forest of 24 single-node trees. Now we build the Huffman tree bottom up using a greody algorithm. We first pat all these tree nodes into a priority quene. A priority queue is an ADT that supports the following operations: 1. Insert an element. 2. Extract the minimum element. In our case, minimum will mean minimum frequency. Now we repeatedly do the folloning, as long as the priority queue contauns at least two elements: 1. Extract the two elements with the lomest frequency x,y from the priotity quene. 2. Make a new tree node with children x and y and frequency the sum of the frequencies of x and y. Call this new node z. 3. Insert z back into the priority quese. We beep doing this tutil there is exactly one element remaining, which will be the root of the Huffman tree. This has to happen eventually, since the size of the priority quese decreases by one in each stage- For example, suppose we wish to form a Huffman tree from the frequencies as shown in Fig. 1. We start with the four tree nodes (00,900), (01,90),(10,9), (11,1). In the first step, the priority quene will extract the two lowest-frequency elements (10,9) and (11,1), form a new tree node (u,10) with chilhren (10,9) and (11, 1), then insert this back into the priority queue. Here u is a new hey that we assign. After this step, the priority queue will contain (00,900),(01,90),(u,10). Doing this again, after the next step the priority queue will contain (00,900) and (v,100), where v is a new node with children (u,10) and (01,90). Finally, the last two nodes are combined, yiekling a node ( w,1000) which is the root of the Huffman tree The tealting Huffman tree looks like Fig. 3. 3.1 What To Do The term codec is short for a comprescion/decotaptession (or coling/decoling) sheme. void conpreas(File input, File output) throus IOException; void decompress(File input. File output) throus 10Eiception; In the following two sections, we will go over carefully what you have to do for cotnpression and decoen-pression. 3.2 Compression 3.2.1 Constructing the Huffman Tree To construct the Huffinan tree from the uncompressed inputFile, you first n eed t o alculate the byte frequencies in the file. Open the file for reading by calling FileIaputStrean inStream = nev FileInputstrean(inputFile) : You can then real bytes sequentially from the file by calling i streas , read( ) s uccesively. This method returns an int containing the next byte in the file in the low-order 8b its. For example, if he input byte wis 01100001 (binary) representing the ASCII character ' n ', the int returned by instream. readO) wonld be 97 . You can use this int as an index into an array of length 256=2, the total number of possible bytes. Go through the file once and record the byte frequencies in an int array of length 256 . Now construct the Huffman tree as described above. You should nse a java. atil. Priorityquewe of TreeNodes for this. First construct a new pricrity queue by calling PriorityqueuectreeNode> pq = neu Priorityqueuectreeliode> (); then invert a new Treekodo for each byte and its freqquency. These will be the leaves of the soon-to-be-bailt Huffnan tree. The minimum element of pq is extracted by calling pq-po110. But how does it know what minimum means? You have to tell it. You do this by making the Treeliode class implement the java. ut11. Comparable interface. ie TreeNode implements ComparablecTreeNodes To implement this interface, you must define a metbod int coeparefo (TreeNode obj). The return value of this method should be negativeifthisisleosthanobj,0ifthiaisequaltoobj,andpositiveifthisisgreaterthanobj. Your conpareTo method can do this by comparing the frequencies, and retaming the appropriate int. Your Treelode class should contuin int fields key for the key (which for leaves will be the uncompressed byte) and frequency for the frequency, TreeNode fields 1 eft and right for the children, and a conpareTo method. You may also include a tostring method for debugging purpoos if you wish, as well as any other needed fields or helper methods. When constructing the tree, you shonld gove each newly constructed internal node a key consisting of a number assigned sequentially, starting at 256 . All the leaves will already bave keys less than 256 . So, the root of the tree should have key value 22562=510. You might want to include this test as a sanity check. Constructing the Byte-to-Codeword Map To do the coding, we will need an eflicient way to map each incoming byte to its cotresponding codeword. We could search the tree every time fot the byte, but this would be very inefficient. We will instead do a little preprocessing to coustruct an array codes of length 256 whose kth element is the codeword for k. Then while we are coding, we will be able to map incoaning bytes to codewords by just pulling the appropriate codeword out of the array. Use lava class Bieset to represent codewords. Each codeword will be atmost 9 bits long You will construct the codewords by walking the tree recursively, filling in the codes array when you encounter a leaf. This should be done with a recursive method createCodes (TreeNode n, Bitset bv). The method should just asign bv to the key of n if n is a leaf. Otherwise, if m is an internal node, it should clone bv, add true to one copy and call recursively on the right subtree, and add talse to the other copy and call recursively on the left subtree. If you do this right, it is only a few lines of code. Compressing the Input File Now we are ready to coanpress the input file. Close the input stream instream and reopen it so we can read the bytes ugain from the beginning. The outpat stream should still be open. Read the bytes sequentially from the input stream. For each byte, get its codeword, then write the codeword to the output bitstream. When you are done, make sure you clooe the imput and output streums. This is especially important for the output stream, because bits are batfered, and not all of them may have been written to the output file yet. Moreover, since the number of bits is not necessarily a multiple of 8 , the actual bit count amist be saved in the output file. The BitStream . close () method takes care of all this for you, but you have to remember to call it when you are done. 3.3 Decompression Reconstructing the Huffman Tree Before we can decompress, we must reconstruct the Huffmin tree that was saved in the compressed file as a preorder sequence of keys. In the decompression phase, the frequencies are not tised. Open the input file as a Bitstrean for rending. Read in the sequence of 9 -bit bitsequences that were the beys of the Huffman tree in preorder into an Iterablecinteger> such as Arraytist or Vector . Now we will reconstruct the tree recursively. The recursive procedure shoubd take an Iterator and return in Treelode, which will be the root of the tree constructed from the keys in the Iterator. The rectursive procedure will be called with the initial Iterator obtained from the Iterable object containing the keys that you read in. The recursive procedure does the following- 1. Get the next key from the Iterator. 2. Make a new Treelode i with that key. This is the root of the subtree we are currently visiting. 3. If we are not at a leaf, do two recursive calls to construct the left and right subtrees and store them as the children of n. We can tell whether we are at a leaf just from the ley. How? 4. Return n. Decompressing the Input File After reading in the Huffman tree, the remainuler of the input flie contains the codewords corresponding to the bytes of the uncompressed file. Now we want to decompress these codewocds and map them back to the original bytes. Open the output Hile as a FileDatputStrean. We can write bytes to it taing the urite (int) method of FileDutputstreat. Set a pointer to the root of the Huffman trev. Read the bits of the input file one at a time and follow the path in the tree as determined by those bits ( 0= left. 1= right). When we arrive at a leaf, that leaf contains the uncotnpresved byte corresponding to the codeword we have just seen. Write that byte to the output streain and reset the pointer to the root of the tree fot the next codeword. Note that this works since the code is self-deliniting; we kncw when we are at the end of a codeword, since that occurs exactly when we are at a leaf of the tree. Beastiful! When done, remetmber to close the iuput and outpet streams. OTHER FILES You are given code for Bitgtrean clasves. With your Huffinan codec, you should see about 25 to 402 compression on text files. We have supplied a text file hwrgtart. Compress this with your Huffman codee and subait the compreand file conpresaed, txt. Jast give as the raw output of your coepress method. You will have to write a little driver to call your compress method on these files, but don't submit that. 3.5 What to Submit Submit files Treellode. java, Haff manaTrese,juva, and Futfase java, each containing a clinss of the same name. The elass Treellode should contain your definition of a tree node. Class Huffmantree should encapoulate Huffman tree; it should have two constructors, one for building a tree from an array of frequency cotmts and one for building a tree from an Iterablecintegers of keys in preorder. The cokle for constructing the byte-to-codeword mapping should also 80 in this class the array representing the mapping should be private and accesed from the outside through a getter method. The class Huffman should be your codec. Finally, submit the file conprested-trt that you hatve compressed froth the stpplied text file his . txt

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro SQL Server Wait Statistics

Authors: Enrico Van De Laar

1st Edition

1484211391, 9781484211397

More Books

Students also viewed these Databases questions

Question

What is Change Control and how does it operate?

Answered: 1 week ago

Question

How do Data Requirements relate to Functional Requirements?

Answered: 1 week ago