Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Development plan 1. As usual, create a directory to hold the files for this assignment. 2. Create files trace.cpp and trace.h . You can use

Development plan

1. As usual, create a directory to hold the files for this assignment.

2. Create files trace.cpp and trace.h. You can use a global variable to indicate the trace level, which can be 0 or 1. Create the variable in trace.cpp. If your variable is called tracelevel, then trace.cpp will say

 int tracelevel = 0; 
and trace.h will say
 extern int tracelevel; 
Any file that includes trace.h will be able to use the tracelevel variable that is created in trace.cpp.

Add a function that takes parameters argc and argv (from main) and sets the trace level to 1 if the command line contains -t.

File trace.cpp will not contain a main function. Be sure not to add one.

3. A Makefile is provided. Look at it to see what the targets are and what they do.

4. Create file huffman.cpp, the source code for the compression program huffman, starting from the template. Add a main function to it that just sets the trace-level. Use the following heading for main (in huffman.cpp).

 int main(int argc, char* argv[]) 
Parameter argv is an array of null-terminated strings having argc members. The names of the files to work on are A = argv[argc???2] and B = argv[argc???1].

Test it. Does it set tracelevel correctly? Does it get the correct file names?

5. Write a function in huffman.cpp to get the character frequencies. Assume the characters in the file are 8-bits (one byte), so there are 256 possible characters. See reading files using the stdio library.

The parameters of this function should be the name of the file to read and an array of 256 integers to fill with character frequencies. If the array is called freq, then freq[i] will hold the frequency of character i.

Create an array of 256 integers in main to store the frequency counts, one for each possible character that can be stored in one byte.

This function should should start by setting all of the character frequencies to 0. Then it should open the appropriate file, read the characters, and increment the count for a character each time the character is seen. When it is done, your function should close the file. Do not put both loops in this function.

Make the frequency counting function return true if it was successful in opening the file and false if not. Be sure to document that.

Now make main call the frequency counter. If it returns false, then there should be a meaningful message to the user, and the program should stop. The program must not keep going when the file to read does not exist.

Important. You will be using characters as array indices. Do not use a value of type char for an array index. Values of type char are assumed to be from ???128 to 127. That won't work for array indices.

Read each character as an integer using getc. Getc will normally return an integer from 0 to 255. It returns EOF (?1) when it has reached the end of the file. If you need to use a character variable as an array index, use type unsigned char.

If you receive a warning that an array index has type char, heed the warning!

If c has type char, statement

 int x = c; 
will not force x to be positive. Say
 unsigned x = (unsigned) c; 

6. Add a function to trace.cpp that takes an unsigned character c as a parameter and prints c's description. The description of 'a' is just a. But the description of ' ' (the newline character) should be , or some other desriptive name, such as newline. Use descriptive names for the tab ('\t') and space characters as well.

For an unprintable character whose integer code is m, use description \m. For example, the character with code 127 will have description "\127".

To check whether a character is printable, #include . Then isprint(c) is true if character c is printable and is not a white-space character. Pass isprint an integer code or an unsigned character, since it uses the character as an index in an array.

7. Add a function to trace.cpp that takes appropriate parameters and, if tracing is turned on, shows the frequencies of all characters that occur at least once in a clear and readable format.

Make main trace the frequency counts if tracing is turned on. Test your program. Does it seem to work?

8. Create a file, tree.h, holding the following type definitions.

#ifndef TREE_H #define TREE_H // Each node is either a leaf or a nonleaf. Type // NodeKind has just two values: leaf and nonleaf. enum NodeKind {leaf, nonleaf}; // An object of type Node is node in a Huffman tree. // The variables are as follows. // // kind tells whether the node is a leaf. // // ch is the character stored in a leaf. // It should only be used for leaves. // // left // right These are the left and right subtrees. // They should only be used for nonleaves. struct Node { NodeKind kind; char ch; Node* left; Node* right; // Create a leaf holding character c. Node(char c) { kind = leaf; ch = c; } // Create a nonleaf with left subtree L // and right subtree R. Node(Node* L, Node *R) { kind = nonleaf; left = L; right = R; } }; typedef Node* Tree; #endif 

You will lose points if you use the ch field of a nonleaf or the left or right fields of a leaf.

You should include tree.h in any file that needs to use the NodeKind, Node or Tree types.

9. Add a function to trace.cpp that prints a tree to the standard output, using the notation described above. To show a leaf in the tree, show the character's description, not the raw character.

10. Write a function in huffman.cpp that takes the array of character frequencies and returns a huffman tree based on the character frequencies. The huffman tree should have type Tree.

For Assignment 6 you implemented the priority queue abstract data type. Copy pqueue.h and pqueue.cpp into the directory for assignment 8 and modify pqueue.h by defining

 #include "tree.h" typedef Tree ItemType; typedef int PriorityType; 
That is, an item is a tree and a priority is an integer.

To build the huffman tree, create a priority queue. Insert a leaf into the priority queue for each character whose frequency is not 0, with its frequency as its priority. Notice that

 Tree t = new Node('a'); 
creates a leaf holding character 'a'.

Now repeatedly remove two trees r and s from the priority queue. Combine them into a single tree by making them the left and right subtrees of a new node. Statement

 Tree t = new Node(r,s); 
creates a new tree with left subtree r and right subtree s. Insert the new tree back into the priority queue.

This function stops when there is only one tree in the priority queue. After taking the first tree r out of the priority queue, check whether the priority queue is empty. If so, then tree r the Huffman tree. Return it. Should you destroy the priority queue?

Important Note. Ensure that your function only uses features of the priority queue module that are part of its interface. Do not use features that should remain hidden inside the priority queue module! Do not use types ItemType or PriorityType. Do not add any new features to pqueue.cpp.

Important Note. The function that builds the Huffman tree is an expert in building Huffman trees. It can have helper functions that do part of the job; for the purposes of this note, they are considered part of the Huffman-tree-building function.

Do not force other functions (except those that are helpers for the tree builder) to know anything about how a Huffman tree is built from an array of frequencies. In particular, no other function should know that a priority queue is involved.

The tree builder should not know more about the rest of the program than it needs to know in order to do its job. Choose its interface in a sensible way.

11. Modify main in huffman.cpp so that it builds the huffman tree and, if tracing is turned on, shows the resulting tree.

Test your program. Does it seem to create a sensible Huffman tree?

12. Now, in huffman.cpp, write a function buildCode that takes the Huffman tree t and an array Code that can hold 256 null-terminated strings. This function should fill in the Code array so that, if character c occurs at a leaf in t then Code[c] holds the code of c, as a null-terminated string. If character c does not occur in tree t, then Code[c] should be NULL.

Your function will need to have the following parameters.

A tree t (a subtree of the full huffman tree)

The array Code to fill with the code strings.

A null-terminated string prefix of 0's and 1's that is the sequence of edge labels from the root of the huffman tree to the root of t.

For each character c that occurs at a leaf of t with code s (according to t), buildCode(t, Code, prefix) adds code (prefix+s) for character c to array Code, where + is string concatenation. For example, if t is (e, g) then buildCode(t, Code, "01") stores code "010" for e and code "011" for g. Now you see why this parameter is called prefix.

If t is a leaf holding character c, then set Code[c] to prefix (since the code for a bare leaf is an empty string). Pay attention to where you are storing strings. Do not store a pointer into a function's run-time-stack frame in the code array! Remember that the frame is destroyed when the function returns. You would do well to store a copy of prefix.

If t is not a leaf, then buildCode should call itself on the left and right subtrees of t. The prefix parameter for the call on the left subtree should be prefix with '0' added to the end. The prefix parameter for the call on the right subtree should be prefix with '1' added to the end. (You can create local arrays in buildCode and copy prefix into them using strcpy. Then concatenate "0" to the end of one and "1" to the end of the other using strcat.)

Be careful with strcat. It is probably not what you imagine it to be. Strcat does not allocate any memory. Statement

 strcat(A,B); 
modifies array A by adding string B to the end of the string in array A. It is up to you to ensure that A is big enough.

Be sure that your arrays are large enough. If array A is supposed to hold prefix followed by another character (0 or 1), then A needs to have size at least strlen(prefix) + 2 (extra bytes for the added character and the null terminator). This has been a frequent source of errors in the past. A common mistake is to write strlen(prefix + 2) instead of strlen(prefix) + 2. What is the difference between the two? What is prefix + 2?

How should you set code[t->ch] equal to prefix? Hint: it does not involve a loop. Do simple things in simple ways.

13. Add a function to trace.cpp that shows the code from a code array. When showing a character, use your function that shows the character's description.

Make main build the code and, if tracing is turned on, show the code.

Test what you have so far.

14. The goal is to pack 8 bits into each byte. But before doing that, take a simpler approach where you write each bit as a character so that you can read it. After everything seems to work you can change it so that it writes a binary file instead.

Get files binary.h, binary1.cpp and binary2.cpp. They provide the following tools for writing files.

Type BFILE

BFILE* openBinaryFileWrite(const char* filename)

This function opens binary file filename for writing and returns a pointer to a structure that describes the open file. It returns NULL if the file cannot be opened.

void writeBit(BFILE* f, int b)

This function writes bit b (either 0 or 1) into open file f. Make sure that b is either 0 or 1. It should not be '0' or '1'.

void writeByte(BFILE* f, int b)

This function writes byte or character b into open file f.

void closeBinaryFileWrite(BFILE* f)

This function closes file f. You must close the file when you are done writing.

Both files binary1.cpp and binary2.cpp have the same interface, described by binary.h. File binary1.cpp writes a text file, writing character '0' to represent bit 0 and character '1' to represent bit 1, allowing you to read the output with a text editor, but not actually compressing. File binary2.cpp writes a binary file, packing 8 bits into each byte.

15. Add a function to huffman.cpp that writes a description of a given tree into a given open binary file (type BFILE*). Note that this is not the same as your trace function that writes a tree for debugging.

To write a leaf, write a 1 bit followed by the character stored at the leaf (8 bits). So, if t is a leaf, you write t as follows.

 writeBit(f, 1); writeByte(f, t->ch); 

To write a nonleaf, write a 0 bit followed by the left subtree followed by the right subtree. So, if t is not a leaf, write t as follows, assuming your function is called writeTreeBinary.

 writeBit(f, 0); writeTreeBinary(f, t->left); writeTreeBinary(f, t->right); 

Modify main so that it opens the second file name in the command line as a binary file for writing. Then it writes the Huffman tree to the binary file and closes the file.

16. Create file unhuffman.cpp, which will be the program that uncompresses a file. Use the template.

Write a function in unhuffman.cpp that reads a tree whose description is in a given open BFILE* file, and returns the tree. Keep it simple. To read a tree, start by reading a bit. If the bit is a 1, you are looking at a leaf. Read the character (a byte) and build a leaf holding that character. But if the bit is a 0, then you are looking at a nonleaf. Read the left subtree then the right subtree.

Important Note. When you encounter a nonleaf, it is important that you read the left subtree then the right subtree, in that order. According to the definition of C++, when a function call f(A, B) is evaluated, expressions A and B can be evaluated in any order. To avoid reading the subtrees in the wrong order, do the recursive calls in separate statements.

Use the following for reading the binary file, also described by binary.h.

BFILE* openBinaryFileRead(const char* filename)

This function opens binary file filename for reading and returns a pointer to a structure that describes the open file. It returns NULL if the file cannot be opened.

int readBit(BFILE* f)

This function reads one bit from file f and returns the bit (either 0 or 1). At the end of the file, readBit returns EOF.

int readByte(BFILE* f)

This function reads one byte (8 bits) from file f and returns the byte. Use it to get a byte that you wrote using writeByte. At the end of the file, readByte returns EOF.

void closeBinaryFileRead(BFILE* f)

This function closes file f. Once it is closed, you cannot read from it again.

Write a main function in unhuffman.cpp. It should start by turning tracing on or off, depending on the command line. (You already have a function that does that.) Then main opens the binary file whose name is given in the command line (argv[argc???1]), reads a tree from it, and, if tracing is turned on, writes the tree to the standard output for debugging. Make the trace readable.

When you link unhuffman.cpp, it is critical that you use the same implementation of binary.h as for huffman.cpp. Either link both with binary1.cpp or both with binary2.cpp.

17. Test what you have so far. You will need a compressed file written by huffman, holding a description of a tree. Your huffman program should write a description of the tree into the binary file and your unhuffman program should read it back correctly.

18. Write a function in huffman.cpp that takes the array of character codes, a binary file that is opened for writing, and the name of the file to read. It should reread the input file and write the code for each character that it reads into the open binary file.

Modify main in huffman.cpp so that, after writing the tree to the binary file, it calls the new function to write the encoded characters.

Note. You will need to write a string of digits (all either '0' or '1') to a binary file. No function is provided to you that does that job. What should you do? Be sure to write 0 or 1 to the binary file, not '0' or '1'.

Note. Do not close the binary file between writing the tree and writing the character codes. Only close it after doing both steps.

19. Return to unhuffman.cpp. Write a function that reads the code for a single character and returns the character, as an integer. If it encounters EOF in the binary file, this function should return EOF. This function will take two parameters, the Huffman tree and an open binary file for reading.

As you read each bit, move down the tree in the appropriate direction. When you hit a leaf, you have read a complete character code, and the character in the leaf is the character that the code stands for.

Important note. The code for a character only has a bit in it for each nonleaf. There is no bit for a leaf. So when reading the code, only read a bit when you are at a nonleaf in the tree, to decide whether to move to the left or right. Do not read a bit when you are at a leaf.

Make this function handle any tree, leaf or nonleaf. Do not make it presume that the tree is a nonleaf. Provide two cases. That makes the function logic clear and simple.

20. Now write a function in unhuffman.cpp to do the uncompression. It should read character codes from the binary file and write them out to the uncompressed file. Keep reading character codes and writing the corresponding character into the uncompressed file until the end-of-file is reached.

This function needs to be told: the open binary file to read; the name of the file to write; and the Huffman tree, needed for decoding. So pass it exactly that information, not something else.

21. Test both programs. If you compress a file and then uncompress it, do you get back exactly what was in the original file? Try a file that contains tabs, spaces and has more than one line. Try some non-letter characters too. You can be sure that I will do that when I test your programs.

You will find the Linux diff utility useful. Command

 diff file1 file2 
gives a summary of the lines where files file1 and file2 differ. If the two files are identical, diff will show nothing. That is what you want.

If the two files are different, diff writes out primitive editing commands that, if performed, make file1 look like file2. It uses commands d (delete), i (insert) and c (change), always working on entire lines. Look at the line numbers given with the command letters. The line numbers before the command letter are in file1 and those after the command letter are in file2.

22. Once you are convinced that everything is working, submit your work.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Databases With PostgreSQL From Novice To Professional

Authors: Richard Stones, Neil Matthew

2nd Edition

1590594789, 978-1590594780

More Books

Students also viewed these Databases questions