Question

1 Approved Answer

Posted on Sep 10, 2024

In this assignment, you are asked to develop a more powerful version of the uniq Unix tool we discussed earlier in the term. This assignment

image text in transcribed

In this assignment, you are asked to develop a more powerful version of the uniq Unix tool we discussed earlier in the term. This assignment is broken down into multiple steps to divide the problem into incremental tasks. This type of incremental development that develops a simple but functional version of a piece of software first and then incrementally adds features is common in the software industry. It also helps to break a seemingly overwhelming task into smaller subtasks that suddenly do not seem that overwhelming at all anymore. In each step, you are expected to modify the solution you developed in the previous step instead of starting from scratch. For full marks, the code you submit should satisfy all the requirements stated in the final step. Solutions that address only a subset of the requirements earn partial marks Recall that the uniq tool reads its input from stdin and sends the lines it reads back to stdout but drops every line that is identical to the line that immediately precedes it. The final version of the unique tool you are tasked to develop in this assignment has the following features It drops any line that is identical to a line read before, not only the immediately preceding line, any line that came before it It must be able to deal with lines of arbitrary lengths It takes an optimal command-line flag -m. If this flag is provided, then it works in "multi-line mode". In this mode, any line that ends with a backslash () is to be treated as a single line together with the line that comes after it. If a whole sequence of lines end with backslashes, then all of them must be joined to form one logical line. This is the same as escaping line breaks in Python source code, for example. We break this into the following incremental steps Step 1: Recreate the functionality of uniq. In other words, you should check only adjacent lines whether they are identical. Step 2: Extend your implementation of Step 1 so it checks for duplicate lines throughout the document, not duplicate adjacent lines. In the interest of an efficient implementation, you should use a sorting algorithm to sort the lines of the input, which guarantees that identical lines are stored consecutively. Now the solution in Step 1 can be used to eliminate duplicate lines. Note that there is no requirement in this step that the order of the lines in the document should be preserved in the output. In this step, you will simply use the qsort function provided as part of the C standard library to sort the lines Step 3: This step adds the last ingredient necessary to meet the first two requirements of the unique tool listed above. Specifically, you will create the illusion that each input line is considered in order and output if it is not equal to any line that you have seen before and dropped otherwise. This can be done by applying the solution from Step 2 but, before producing the final output, rearranging the lines that were not dropped by the code in Step 2 in the order they appeared in the input. Step 4: In this step, you need to check the command line arguments of the program, check whether -m is the only argument and, depending on the outcome, activate multiline mode. Then, when reading input lines, you need to ensure that you collect all lines separated by escaped newline characters into a single line record internally. An added complication is that you should consider the inputs This is a line and This is a to be the same as far as checking for duplicates goes, but in the output of the program, the line breaks of the line that is kept should be preserved. For example, for the input This is a line Another line This is a the output should be This is a line Another line To accomplish this, you need to keep a record of where the line breaks were in the input. Step 5: The final step is to replace the qsort function with your own implementation of Merge Sort. In general, it is a better idea to use functions already provided as part of a library unless these library functions are inadequate for some reason. You are asked to implement your own sorting algorithm here to make you practice writing recursive functions The remainder of this assignment states the requirement for each of these five steps in more detail and/or gives some hints that should be helpful in completing each step Step 2 Your strategy in this step should be to sort the lines in the input so identical lines in the input are stored consecutively. Then you can eliminate consecutive duplicate lines as in Step 1. There is no need to preserve the order of the input lines in the output. For example, on the input def abc def you are allowed to produce the output abc def To accomplish this step, you need two ingredients 1. A function that can sort an array of data items, the array of char pointers representing the input ines in this case 2. A comparator function that is used by the sorting function to determine the correct order of pairs of input items For the latter, you should use the strcmp function again, with a twist. For the former, you should use the qsort sorting function provided by the C standard library again. As you can see in the documentation of the qsort function, it calls the comparator function with pointers to the two data items to be compared as arguments. This creates a small wrinkle here: The items we want to sort are of type char *, since this is how we represent each input line. Thus, the arguments passed to the comparator function by qsort are of type char **. strcmp, however, expects to arguments of type char * (because it is meant to compare two strings, not two pointers to strings) Therefore, you need to write a wrapper for strcmp suitable to be used by qsort. All this wrapper needs to do is dereference its char ** arguments to obtain to char values that can be passed to strcmp, call strcmp on these two char pointers, and then return the result. To complete this step, all you need to do is to combine the discussed ingredients 1. Sort the lines of text using qsort and your strcmp wrapper 2. Eliminate duplicate lines using your solution developed in Step 1. 3. Print the lines that remain as in Step 1 Step 3 Next you need to modify your code so the lines that are included in the output are output in the same order as they appear in the input. For the input def abc abc def this means that the output should be def abc not abc def (Recall that, of all identical lines in the input, the first one should be kept.) The strategy is fairly simple: You sort the input lines and eliminate duplicates as in the previous step. Then you rearrange the remaining lines in their original input order and output the resulting list. To support this, you need to 1. Number the lines that you read in the order you read them and change your representation of each input line from a simple char to a struct that stores the line number and this char *. 2. Change your sorting step from Step 2 so it sorts the lines by their content and identical lines by their line numbers. This is a simple extension of the comparison function for pairs of lines yur developed in Step 2 and ensures that the elimination strategy from Step 1 indeed keeps the first line in each set of identical lines 3. Restore lines to their original order after eliminating duplicates. You do this by sorting the lines that were not eliminated a second time, this time only by their line numbers, ignoring their contents. To do so, you need to implement a second comparison function that compares lines by line number. In this assignment, you are asked to develop a more powerful version of the uniq Unix tool we discussed earlier in the term. This assignment is broken down into multiple steps to divide the problem into incremental tasks. This type of incremental development that develops a simple but functional version of a piece of software first and then incrementally adds features is common in the software industry. It also helps to break a seemingly overwhelming task into smaller subtasks that suddenly do not seem that overwhelming at all anymore. In each step, you are expected to modify the solution you developed in the previous step instead of starting from scratch. For full marks, the code you submit should satisfy all the requirements stated in the final step. Solutions that address only a subset of the requirements earn partial marks Recall that the uniq tool reads its input from stdin and sends the lines it reads back to stdout but drops every line that is identical to the line that immediately precedes it. The final version of the unique tool you are tasked to develop in this assignment has the following features It drops any line that is identical to a line read before, not only the immediately preceding line, any line that came before it It must be able to deal with lines of arbitrary lengths It takes an optimal command-line flag -m. If this flag is provided, then it works in "multi-line mode". In this mode, any line that ends with a backslash () is to be treated as a single line together with the line that comes after it. If a whole sequence of lines end with backslashes, then all of them must be joined to form one logical line. This is the same as escaping line breaks in Python source code, for example. We break this into the following incremental steps Step 1: Recreate the functionality of uniq. In other words, you should check only adjacent lines whether they are identical. Step 2: Extend your implementation of Step 1 so it checks for duplicate lines throughout the document, not duplicate adjacent lines. In the interest of an efficient implementation, you should use a sorting algorithm to sort the lines of the input, which guarantees that identical lines are stored consecutively. Now the solution in Step 1 can be used to eliminate duplicate lines. Note that there is no requirement in this step that the order of the lines in the document should be preserved in the output. In this step, you will simply use the qsort function provided as part of the C standard library to sort the lines Step 3: This step adds the last ingredient necessary to meet the first two requirements of the unique tool listed above. Specifically, you will create the illusion that each input line is considered in order and output if it is not equal to any line that you have seen before and dropped otherwise. This can be done by applying the solution from Step 2 but, before producing the final output, rearranging the lines that were not dropped by the code in Step 2 in the order they appeared in the input. Step 4: In this step, you need to check the command line arguments of the program, check whether -m is the only argument and, depending on the outcome, activate multiline mode. Then, when reading input lines, you need to ensure that you collect all lines separated by escaped newline characters into a single line record internally. An added complication is that you should consider the inputs This is a line and This is a to be the same as far as checking for duplicates goes, but in the output of the program, the line breaks of the line that is kept should be preserved. For example, for the input This is a line Another line This is a the output should be This is a line Another line To accomplish this, you need to keep a record of where the line breaks were in the input. Step 5: The final step is to replace the qsort function with your own implementation of Merge Sort. In general, it is a better idea to use functions already provided as part of a library unless these library functions are inadequate for some reason. You are asked to implement your own sorting algorithm here to make you practice writing recursive functions The remainder of this assignment states the requirement for each of these five steps in more detail and/or gives some hints that should be helpful in completing each step Step 2 Your strategy in this step should be to sort the lines in the input so identical lines in the input are stored consecutively. Then you can eliminate consecutive duplicate lines as in Step 1. There is no need to preserve the order of the input lines in the output. For example, on the input def abc def you are allowed to produce the output abc def To accomplish this step, you need two ingredients 1. A function that can sort an array of data items, the array of char pointers representing the input ines in this case 2. A comparator function that is used by the sorting function to determine the correct order of pairs of input items For the latter, you should use the strcmp function again, with a twist. For the former, you should use the qsort sorting function provided by the C standard library again. As you can see in the documentation of the qsort function, it calls the comparator function with pointers to the two data items to be compared as arguments. This creates a small wrinkle here: The items we want to sort are of type char *, since this is how we represent each input line. Thus, the arguments passed to the comparator function by qsort are of type char **. strcmp, however, expects to arguments of type char * (because it is meant to compare two strings, not two pointers to strings) Therefore, you need to write a wrapper for strcmp suitable to be used by qsort. All this wrapper needs to do is dereference its char ** arguments to obtain to char values that can be passed to strcmp, call strcmp on these two char pointers, and then return the result. To complete this step, all you need to do is to combine the discussed ingredients 1. Sort the lines of text using qsort and your strcmp wrapper 2. Eliminate duplicate lines using your solution developed in Step 1. 3. Print the lines that remain as in Step 1 Step 3 Next you need to modify your code so the lines that are included in the output are output in the same order as they appear in the input. For the input def abc abc def this means that the output should be def abc not abc def (Recall that, of all identical lines in the input, the first one should be kept.) The strategy is fairly simple: You sort the input lines and eliminate duplicates as in the previous step. Then you rearrange the remaining lines in their original input order and output the resulting list. To support this, you need to 1. Number the lines that you read in the order you read them and change your representation of each input line from a simple char to a struct that stores the line number and this char *. 2. Change your sorting step from Step 2 so it sorts the lines by their content and identical lines by their line numbers. This is a simple extension of the comparison function for pairs of lines yur developed in Step 2 and ensures that the elimination strategy from Step 1 indeed keeps the first line in each set of identical lines 3. Restore lines to their original order after eliminating duplicates. You do this by sorting the lines that were not eliminated a second time, this time only by their line numbers, ignoring their contents. To do so, you need to implement a second comparison function that compares lines by line number