Question
The primary purpose of this problem is to work with trees. Background There are many ways to encode information. The ASCII sequence is used to
The primary purpose of this problem is to work with trees.
Background
There are many ways to encode information. The ASCII sequence is used to encode character relationships, for example. ASCII codes are also fixed-length, hence the terms 7-bit ASCII and 8-bit ASCII. Uniform lengths make turning a sequence of bits into the corresponding ASCII characters very easy.
But what if you wanted variable length codes? Having such a code would permit values that are very popular (say, the letters e and t in English prose) to have shorter codes than less-popular values, thus saving space. Without a uniform code length, we need a way to tell when the code for one value stops and the next starts. One way to do this is to create a prefix code. In a prefix code, the encoding of any value is never the first part of the encoding of any other value. This means that if we read a pattern and match it to a value, we know that we can stop trying to find a match; there isnt a longer potential match to worry about.
For example, say that the encoding for e is 11, and the encoding for t is 110. After reading the pattern 11, we dont know if we should say that we found an e, or keep going to see if well find the code for t. In a prefix code, this situation cant happen. t would have to be encoded with a value whose prefix is something other than 11.
One way to create a prefix code is called Huffman Coding. It generates binary encodings, which map nicely to binary trees if we allow the left child references to represent 0 and the right to represent 1. You wont be writing the encoder for this assignment. Instead, well give you a non-hierarchical representation of the tree and an encoded sequence of values. Your job will be to reconstruct the tree from the given representation, and use it to decode a given sequence of values.
Expected Behavior
Write a Python program, in a file huffman.py, that behaves as specified below.
Prompt the user for the name of an input file using input('Input file: '). The format of this file is specified under Input format (below).
Build the decoding tree. This can be done recursively from the preorder and inorder traversals: see Algorithm below.
Once the tree has been constructed, the third line of the input file can be decoded. The algorithm for doing this is given under Algorithm below.
Input format
An input file is a text file containing three lines of text:
The first line is the preorder traversal of the decoding tree. This is a list of integers, each separated from the next by one or more spaces. Each integer will be unique; there will be no duplicates.
The second line is the inorder traversal of the same tree, formatted in the same way as the preorder traversal.
The third line is the encoded sequence of values. The encoding will take the form of a sequence of the characters 0 and 1. (That is, were simulating a file of bits.)
Example:
9 0 6 3 2 8 4 6 0 2 3 8 9 4 0001011011100011
(Were not showing the corresponding tree; thats for you to figure out!)
Output format
Your program is to output two sequences of values. The first is the postorder traversal of the tree you build from the first two lines of the data file. The second is the decoded sequence of values.
For example, heres output that corresponds to the sample data file shown above:
6 2 8 3 0 4 9 62448468
The first line is the post-order traversal of the tree; the second line is the decoded sequence.
Algorithm
1. Tree Construction
The decoding tree can be constructed recursively from the preorder and inorder traversals. We know that the first value in a preorder traversal is the trees root, and we can build a binary tree node to hold it. In the inorder traversal, the values ahead of the root are in the roots left subtree, and those to the right are in the right subtree. Each of those groups of values will be grouped together in the preorder traversal, too, but likely not in the same order. Because each of those two groups came from subtrees, the preorder traversal reveals the root of each subtree. Thus, we can recursively continue the process on the content of the subtrees and thereby reconstruct the original tree.
As an example, here are the preorder and inorder traversals of a binary tree of integers:
Preorder: 5 8 9 2 4 3 Inorder: 8 9 5 4 2 3
The first value in the preorder sequence is 5. Ahead of 5 in the inorder traversal are 8 and 9, and after it are 4, 2, and 3. Thus, 5s left subtree must contain 8 and 9, and the right must contain 4, 2, and 3. But in which configurations? Back to the preorder traversal. 8 is ahead of 9 in the preorder sequence, meaning that 8 is the root of that subtree. In the other group, 2 is the root. We can call the method on itself twice, once for each subgroup, to form those trees. When completed, the resulting tree is:
2. Decoding an encoded sequence of values
To decode the encoded sequence of values (the third line of the input file), start at the root of the tree, and read the first character. If 0, follow the left child reference. Otherwise, follow the right. When you reach a leaf node, output the nodes value. Return to the trees root, and continue until the input is exhausted.
No recursion required! Note that the values of the internal nodes will never be output.
NOTE: Output is generated only for leaf nodes. For example, consider the following tree:
In this case, there are two leaf nodes, and the sequences that lead to them are 01 and 10. Thus, if the sequence being decoded is 00 or 11, your program should not produce any output because these sequences do not lead to a leaf node.
Programming Requirements
Your program should meet the following requirements:
The decoding tree should be constructed recursively. This code takes as arguments two lists of integers: the preorder traversal and the inorder traversal of the tree and returns the tree.
At each level of this recursive tree construction process, you have to find the nodes in the preorder and inorder traversals for each of the left and right subtrees of the node being processed. This can be done using list comprehensions. (You are allowed to choose whether to use them or not.)
Other than this, you are free to design your tree class as you see fit. We've had ten assignments by now, and at this point you ought to know what to do and how to do it. Have fun, and impress us!
Errors
The only error for this program is the following:
Input file cannot be read.
Program behavior: Use a try statement to detect this. Give an error message and quit.
Error message: "ERROR: Could not open file " + filename
Examples and testing
At this point in the semester you've done enough programming, and worked with enough programming examples, that they should not be a mystery to you. Therefore, instead of giving you a bunch of examples, we will describe how you can construct examples for yourself:
Write down a set of node labels. Each of these is an integer.
As an example, one of the examples I constructed began with the set of labels
1 2 3 4 5 6 7 8
Construct a binary tree where each node is labeled with one of the labels in your label set from the previous step. There is no particular requirement your tree has to satisfy here, so you are free to build the tree however you want.
I used paper-and-pencil for this, but of course you are welcome to write code to do this. I simply crossed off each label from my list of labels as I put it into my tree.
The line connecting any pair of nodes in a tree is usually referred to as an edge. For each node in your tree, label the edge to its left subtree with '0' and the edge to its right subtree with '1'.
For example, one of the trees I generated from the set of labels shown above is:
Write down the preorder and inorder traversals of the tree. These are the first two lines of your input file.
In the case of the tree shown here, these are:
Preorder: 3 4 6 1 7 8 2 Inorder: 6 4 7 1 8 3 2
Write down any sequence of the leaf nodes of your tree (repetitions are allowed).
In the tree shown above, for example, the set of leaf nodes is {6, 7, 8, 2}, so we have to use a sequence over this set, e.g.: 6 7 7 8 2 6
For each leaf node in the sequence from the previous step, write down the sequence of labels on the edges as you go from the root of the tree to that leaf node:
Leaf node:
6
7
7
8
2
6
Root-to-leaf path:
00
010
010
011
1
00
The root-to-leaf path, which is a sequence of 0s and 1s, is the third line of your input file.
In the example above there is a space after the root-to-leaf path for each leaf node. This is just for clarity: in the input file, all you will see is a sequence of 0s and 1s without any such spaces. So in this example, the third line in the input file will be
00010010011100
If everything went well, feeding the resulting input file to your program should cause it to print out a post-order traversal of your tree and the sequence of leaf nodes you used in step 5 above.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started