Question

1 Approved Answer

Posted on Sep 22, 2024

Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words and keeps statistics

Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words and keeps statistics on the data in the file. Do NOT use C++ string objects for any purpose in this program. DO NOT USE C-STRING OBJECTS.

The program is to be broken into functions as usual and be well-commented.

Each line of text consists of words, that terminate in either spaces, newline or the following common punctuation marks: period ( . ) comma ( , ) semicolon( ; ) exclamation point ( ! ) or question mark ( ? ) To keep things relatively simple, the input text will all be lower case.

Here are the tasks the program must do and suggested functions:

1. The main function opens and closes the files, utilizes a read-till-end-of-file loop that calls getline() to read the input file, line by line, and calls the other functions. main() is relatively short.

2. As each line is read, echo print it and call a function that tokenizes the C-string into words and stores each uniqueword in a 2-D array of char or an array of word structures. (your choice) The tokenizer function adds each new word to the array of words or word structs, updates the number of times the word has been seen, and keeps track of the total word count and total unique word count.

3. To do its job, the tokenizer function calls a linear search function that checks to see if the newly found word is not already in the words array. Suggestion: declare a parallel array of integers that stores the number of times a word has been seen. Alternate suggestion: Store each unique word and the count of how many times it appears in a struct variable and declare an array of these word structures.

4. Write a function that sorts the words array in alphabetic order.

5. Write a function (or functions) that

a. finds the location of the longest word. If there is a tie for longest, choose one.

b. finds the total number of words of length 1-3 characters, such as: 'a', 'or', 'the' in the file.

c. finds the location of the word that occurred most frequently.

6. Write a function that outputs all unique words found in sorted order with their occurrence counts, one word/count pair per line into an output file and writes the longest word found, the word that occurred most frequently, and the number of short words found.

The program must work correctly with an EMPTY input file by supplying an error message. Create other test files of your own as well as running your program with my test files. When you construct your own input file omit the ' ' on the last line. or your test for end-of-file might not work correctly. (This may cause the program to read a zero-length line or seem to read the last line twice before seeing end-of-file.)

For this program, you may assume that a line will be no longer than 120 characters, an individual word will be no longer than 15 letters and there will be no more than 200 unique words in the file. However, there may be more than two hundred total words in the file.

I am allowing some latitude on how to do this program.

1. You may use the strtok(mystring, "char-list") as I show in my "ragged"-array examples posted in the Canvas module for Week 3, or you may write your own tokenizer by looping and examining characters in the input string. In that case use the character testing functions.

Using strtok() is easier in my opinion but either way works.

2. You may keep each unique word and its count in a struct variable and declare an array of structs instead of two parallel arrays or you may use two parallel arrays: a 2-D char array of the words and an int array of the counts.

Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words and keeps statistics on the data in the file. Do NOT use C++ string objects for any purpose in this program. DO NOT USE C-STRING OBJECTS The program is to be broken into functions as usual and be well-commented. Each line of text consists of words, that terminate in either spaces, newline or the following common punctuation marks, period (.) comma (.) semicolon() exclamation point() er question mark (?) To keep things relatively simple, the input text will all be lower case. Here are the tasks the program must do and suggested functions: 1. The main function opens and closes the files, utilizes a read-till-end-of-file loop that calls getline) to read the input file, line by line, and calls the other functions, main() is relatively short 2. As each line is read, echo print it and call a function that tokenizes the C-string into words and stores cach unique word in a 2-D array of char or an array of word structures. (your choice) The tokenizer function adds cach new word to the array of words or word structs, updates the number of times the word has been seen, and keep track of the total word count and total unique word count 3. To do its job, the tokenizer function calls a linear search function that checks to see if the newly found word is not already in the words array. Suggestion: declare a parallel array of integers that stores the number of times a word has been seen. Alternate suggestion Store cach unique word and the count of how many times it appears in a struct variable and declare an array of these word structures. 4. Write a function that sorts the words array in alphabetic order 5. Write a function (or functions) that a finds the location of the longest word. If there is a tie for longest choose one. b finds the Motal number of words of length 1-3 characters, such as for the' in the file c finds the location of the word that occurred most frequently 6. Write a function that outputs all unique words found in sorted order with their occurrence counts, one word count pair per line into an output file and writes the longest word found, the word that occurred most frequently, and the number of short words found. The program must work correctly with an EMPTY input file by supplying an error message. Create other test files of your own as well as running your program with my test files. When you construct your own input file omit the 'in' on the last line or your test for end-of-file might not 1 of 2 work correctly. (This may cause the program to read a zero-length line or seem to read the last line twice before seeing end-of-file.) For this program, you may assume that a line will be no longer than 120 characters, an individual word will be no longer than 15 letters and there will be no more than 200 unique words in the file. However, there may be more than two hundred total words in the file. I am allowing some latitude on how to do this program 1. You may use the strtok(mystring, "char-list") as I show in my "ragged"-array examples posted in the Canvas module for Week 3 or you may write your own tokenizer by looping and examining characters in the input string. In that case use the character testing functions Using strtok) is easier in my opinion but either way works. 2. You may keep each unique word and its count in a struct variable and declare an array of structs instead of two parallel arrays or you may use two parallel arrays: a 2-D char array of the words and an int array of the counts