Question

1 Approved Answer

Posted on Sep 24, 2024

Your task in the first part is to write a C program called first that reads an input file, which contains a list of dictionary

Your task in the first part is to write a C program called first that reads an input file, which contains a list of dictionary and data file pairs, and generates statistics for each pair. Each line in your input file contains the names of the dictionary and data files. You have to read the files and generate the following statistics:

1. For every word w in the dictionary file, count the number of words w 0 that occur in the data file such that w 0 = w.

2. For every word w in the dictionary file, count the number of words w 0 that occur in the data file such that w is a proper prefix of w 0 (we shall say w 0 is a superword of w).

Write all the unique words in the dictionary along with these counts to the output file in lexicographical (i.e. alphabetical) order.

Definition of a word: Any string of characters can be a word. For example in the sentence: a&ab&abc234 dfg the words are: a, ab, abc, dfg

They do not need to be meaningful. Words, in both dictionary and data files, correspond to the longest continuous sequence of letters read from the file. Another way to say this is that words are any sequence of letters separated by non-letter characters (punctuation, numbers, whitespace, etc.). Each unique word is case-insensitive. That is, boOK, Book and bOOk are all occurrences of the same (unique) word book. Case-insensitivity also applies when matching prefixes: for example, both Boo and bOo are proper prefixes of bOOk, which itself is a proper prefix of booKING (See the example below).

As an example, suppose the content of the dictionary file is: boo22$Book5555bOoKiNg#bOo#TeX123tEXT(JOHN)

and that of the data file is: John1TEXAN4isa1BOoRiSH%whohasa2bo3KING BOOKING bOoKIngs$12for a TEX-Text(BOOKS(textBOOKS)

Then, the various counts for the unique words in the dictionary file are:

image text in transcribed

Input/Output specification

The program first should have the following usage interface: first where is the name of the mapping file. You can assume that the mapping file will exist and that it is well structured. So, unlike assignment1, you dont need to check the structure of the mapping file.If no argument or more than one argument is provided, or the file names provided are invalid, the program should print invalid input and abort.

Input specification

Here, and below, let m be the maximum number of words in either the dictionary or data files. Every word is of length at most k. Let n be the number of unique words in the dictionary file. Your input will be a mapping file, which contains lines of the form: hdictFilei hdataFilei, where dictFile and dataFile are the dictionary and data files for your program, respectively. An example of a mapping file is given below:

dict_1 data_1

dict_m data_m

The files: dict 1.txt and dict m.txt are dictionary files and files: data 1.txt and data m.txt are data files. They are plain text files with no special structure.

Output specification Your program should generate several output files outi.txt, where i is the line number in mapping file. It means that you need to get mapping file as an argument to your program. Then each line in the mapping file has information about the dictionary file and the data file. For example suppose line j in the mapping file is dict j data j. In this case you should produce outj.txt, which contains the described informations. Remember that you shouldnt have any spaces at the end of the lines in your output files. Also, you shouldnt have any empty lines in your outputs files . The program should write all the unique words (see definition above) that occur in the dictionary file along with their various counts (See above), in lexicographical order, one word per line to the output files i.e. the output should have exactly n lines.

For example, running first on the input described above should produce the following output:

boo 0 4

book 0 3

booking 1 1

john 1 0

tex 1 3

text 1 1

In C, not Java, thank you

Unique words No. of occurrences No of superwords boo book booking tex text john Unique words No. of occurrences No of superwords boo book booking tex text john