Question

1 Approved Answer

Posted on Sep 24, 2024

In C The following information shows how your program should be used: To compress (encode) a file, issue the command: hcompress -e filename This should

In C

The following information shows how your program should be used:

To compress (encode) a file, issue the command:

hcompress -e filename

This should create a new file called "filename.huf" (same name with a .huf extension) which is the compressed version of the file.

To uncompress (decode) a compressed file, issue the command:

hcompress -d filename.huf

This should create a new file called "filename.huf.dec" which is the uncompressed/decoded version of the file.

If the parameters are not correct for the compress command, then the compress program should display a useful error message (not just die!). This includes the situation where the filename given does not refer to an actual file on the system. Note that you probably want to compare the two files to make sure they are the same after running. This comparison can be done using the Linux command "diff". That is, "diff filename1 filename2". Or you can compare them by viewing them in a text editor.

Coding:

You should have two files: hcompress.c and linkedList.c, along with their respective.h files.

Your linked list code should be based off the code from class. However, note a few items. Firstly, this code needs to be broken up between a .c and a .h file. It is currently all in one file. Second, the class code creates a linked list of ints and you will need a linked list of Huffman tree nodes (see below). Thus, it will have to be modified for that use. Third, you will need to add a method called list_add_in_order which works very much like the current add, except that it takes the data in question and searches down the linked list from the given point and places the node in order into the list. In order will be based on weights so that the lowest two weighed nodes will always be at the front. Yes, a heap based priority queue would be better for this, but there is enough other work to do in this project so you can just stick to extending the linked list we wrote in class.

You will also need a file called hcompress.c which contains the following main:

int main(int argc, char *argv[]) {

// Check the make sure the input parameters are correct

if (argc != 3) { printf("Error: The correct format is \"hcompress -e filename\" or \"hcompress -d filename.huf\" "); fflush(stdout);

exit(1);

}

// Create the frequency table by reading the generic file

struct tnode* leafNodes = createFreqTable("decind.txt");

// Create the huffman tree from the frequency table

struct tnode* treeRoot = createHuffmanTree(leafNodes);

// encode

if (strcmp(argv[1], "-e") == 0) {

// Pass the leafNodes since it will process bottom up

encodeFile(argv[2], leafNodes);

} else { // decode

// Pass the tree root since it will process top down

decodeFile(argv[2], treeRoot);

}

return 0;

}

This implies the 4 additional methods you need to make the code functional. This also implies the existence of a tnode structure (tree node). You should use the following definition of that structure. If you feel like making a typedef for this structure you certainly can. Note that my weight is a double because I stored the % of that character in the file. If you want, you may use an int as a frequency. Along those same lines, my char (c) is an int. Recall that ints and chars are interchangeable in C.

struct tnode {

double weight;

int c;

struct tnode* left;

struct tnode* right;

struct tnode* parent; };

GenerateFreqTable should read in the generic text file and count all the characters in the file. The array like table returned should be of struct tnodes. These will be the leaf nodes in the Huffman tree. Note that it always reads in the same file (I've provided decind.txt from textfiles.com, but you may use any large text file you prefer) and generates the frequency map from this generic file. Your frequency counts should include counts for all ascii characters from 0-127. Feel free to use a different text file for creating your frequencies.

CreateHuffmanTree should take in this array of leaf nodes and form the Huffman Tree. To do so, you will need to use your LinkedList. You should place all your nodes in the list in order. Then simply repeatedly combine the lowest two nodes, forming a new node, and add that new node back into the list. When you are done, you should have the root node that you need to return.

EncodeFile takes in the name of the file you would like to encode. This could be decind.txt or it could be some other file that has a similar letter distribution as decind.txt. As you read in each character from the file you should find that character in the leafNodes array. Then you should use the tree that is attached to that leaf node to walk up to the root node and discover the Huffman code that should be used to encode that character. Once the number of bits you need to write out goes over 8, you should write out that byte to the .huf file. That is, you should not read in the entire file and turn it into a large string of 0s and 1s in memory before writing out. You should write it out as you have enough to write out.

DecodeFile is similar to encode except that it will work top-down from the root of the Huffman tree to the leaf nodes.