Question

1 Approved Answer

Posted on Sep 26, 2024

c++ Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words and keeps

c++

image text in transcribed

Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words and keeps statistics on the data in the file. Do NOT use C++ string objects for any purpose in this program. If you use any C++ string objects for this assignment you will receive a score of zero. The program is to be broken into functions as usual and be well-commented. Each line of text consists of words, that terminate in either spaces, newline or the following common punctuation marks: period (-) comma (,) semicolon( : ) exclamation point (!) or question mark (?) To keep things relatively simple, the input text will all be lower case. Here are the tasks the program must do and suggested functions: 1. The main function opens and closes the files, utilizes a read-till-end-of-file loop that calls getline() to read the input file, line by line, and calls the other functions, main() is relatively short. 2. As each line is read, echo print it and call a function that tokenizes the C-string into words and stores each unique word in a 2-D array of char or an array of word structures. (your choice) The tokenizer function adds each new word to the array of words or word structs, updates the number of times the word has been seen, and keep track of the total word count and total unique word count. 3. To do its job, the tokenizer function calls a linear search function that checks to see if the newly found word is not already in the words array. Suggestion: declare a parallel array of integers that stores the number of times a word has been seen. Alternate suggestion: Store each unique word and the count of how many times it appears in a struct variable and declare an array of these word structures. 4. Write a function that sorts the words array in alphabetic order. 5. Write a function (or functions) that a. finds the location of the longest word. If there is a tie for longest choose one. b. finds the total number of words of length 1-3 characters, such as: 'a', 'or', 'the' in the file. c. finds the location of the word that occurred most frequently. 6. Write a function that outputs all unique words found in sorted order with their occurrence counts, one word/count pair per line into an output file and writes the longest word found, the word that occurred most frequently, and the number of short words found. The program must work correctly with an EMPTY input file by supplying an error message. construct your own input file omit the "n' on the last line or your test for end-of-file might not work correctly. (This may cause the program to read a zero-length line or seem to read the last line twice before seeing end-of-file.) For this program, you may assume that a line will be no longer than 120 characters, an individual word will be no longer than 15 letters and there will be no more than 200 unique words in the file. However, there may be more than two hundred total words in the file. I am allowing some latitude on how to do this program. 1. You may use the strtok(mystring, "char-list") as I show in my "ragged"-array examples posted in the Canvas module for Week 3, or you may write your own tokenizer by looping and examining characters in the input string. In that case use the character testing functions. Using strtok() is easier in my opinion but either way works. 2. You may keep each unique word and its count in a struct variable and declare an array of structs instead of two parallel arrays or you may use two parallel arrays: a 2-D char array of the words and an int array of the counts. 3. If you are really brave and want practice with pointers, you may choose to dynamically allocate space for each new word as it is tokenized, store the word in its new space and save its pointer. This is what the word struct looks like for this approach: struct wordcount ! char *word; //unique word found in the file. Can be any length. int count; //Number of times word was found in the file }; Submit the source file and your test input files as well as your output files as usual. Extra Credit: Allow the input text to contain both upper and lower case letters. Do NOT consider words to be different just because of case. For example, "The" and "the" are the same words. Also allow the input file to contain digits but do not count these as words