Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1 Project Description Write a C program in Linux that will read two input files, identify common words that appear in both files, and write

1 Project Description Write a C program in Linux that will read two input files, identify common words that appear in both files, and write them into an output file in sorted order along with their total number of occurrences in both files. For each input file, your program will build a separate linked list where each node of the list will contain a unique word that exists in that file and the number of occurrences of that word (hereafter referred to as count) within that file. Hence, each node of the linked list will be a struct having fields to store the word itself (char *) and its count (int). You will build the list as a doubly linked list, where each node in the list has a pointer to the next node and the previous node. After building these two input lists, your program will find the common words in both lists, find their total count, and create a third output list that will be used for printing the result into the output file in ascending sorted order. Sorting will be done based on the word field. For comparison of the words, you will use the strcmp() function. You must implement and use the insertion sort algorithm to sort the two input lists. Each line of the output file will contain information about a word occurring in both input files in the following format: Word,TotalCount Note the comma (instead of space or TAB characters) between the fields. The output should not contain any spaces, TABs, or empty lines. So, after you write the last line of information, you should close the file, and this should be the end of your program. It is very important that you produce the output in this rigid format since we will use this format in our automated black-box tests. Words in input files are separated by whitespace characters, which can either be a , a , or a character.

2 Development It is mandatory for this assignment that you use three linked list data structures (two for the input files, and one for the output file), dynamic memory allocation, and the insertion sort algorithm. You are not allowed to build your input linked lists in sorted order on the fly while inserting the elements. You will build them by simply inserting new words to the end (tail) of the list. Then, you will implement an insertion sort algorithm and sort these two linked lists using this function. You are not allowed to use the insertion sort function for the third linked list; the third list will be constructed by simply merging the first two sorted linked lists in a sorted order. Also, you cannot assume a maximum number of characters for the words, so the memory for them should be created dynamically as well (using strdup()), which will need to be freed later. This project will allow you to practice with malloc, pointers, and C structs. The name of your executable file has to be common and a sample invocation of your program is as follows: ./common in1.txt in2.txt out.txt Here, in1.txt is name of the the first input text file, in2.txt is the name of the second input text file, and out.txt is the name of the output text file which you will store your final output. Note that input and output file names do not have to be in1.txt, in2.txt, and out.txt, different names can be passed to the program as command line arguments. It is very important that you follow the specifications so that you do not lose any points. Example 1 An example first input file (for example, in1.txt) can be like the following:

THE GODFATHER _____________ Screenplay by MARIO PUZO and FRANCIS FORD COPPOLA THE THIRD DRAFT THE PARAMOUNT PICTURES 1 Gulf and Western Plaza March 29, 1971 New York, New York 10019

and an example second input file (for example, in2.txt) can be like the following:

THE GODFATHER Part Two Screenplay by Mario Puzo and Francis Ford Coppola THE SECOND DRAFT PARAMOUNT PICTURES 1 Gulf and Western Plaza September 24, 1973 New York, New York 10019 2

Assuming the following invocation: ./common in1.txt in2.txt out.txt The output file out.txt should be as follows:

1,2

10019,2

DRAFT,2

GODFATHER,2

Gulf,2

New,4

PARAMOUNT,2

PICTURES,2

Plaza,2

Screenplay,2

THE,5

Western,2

York,2

York,,2

and,4

by,2

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mastering Big Data Interview 751 Comprehensive Questions And Expert Answers

Authors: Mr Bhanu Pratap Mahato

1st Edition

B0CLNT3NVD, 979-8865047216

More Books

Students also viewed these Databases questions