Question
There are two files. One contains text to be read and analyzed, and is namedHarryPotter.txt. As the name implies, this file contains the full text
There are two files. One contains text to be read and analyzed, and is namedHarryPotter.txt. As the name implies, this file contains the full text from Harry Potter and the Sorcerers Stone.
For your convenience, all the punctuation has been removed, all the words have been converted to lowercase, and the entire document is written on a single line. The other file contains the 50 most common words in the English language, which your program will ignore. It is called ignoreWords.txt. Your program must take three command line arguments in the following order - a number N, the name of the text to be read, and the name of the text file with the words that should be ignored. It will read in the text (ignoring the words in the second file) and store all unique words in a dynamically doubling array. It should then calculate and print the following information: The number of array doublings needed to store all the unique words The number of unique non-ignore words in the file The total word count of the file (excluding the ignore words) The N most frequent words along with their probability of occurrence (up to 4 decimal places )
1. Use an array of structs to store the words and their counts There is an unknown number of words in the file. You will store each unique word and its count (the number of times it occurs in the document). Because of this, you will need to store these words in a dynamically sized array of structs . The struct must be defined as follows: struct wordItem { string word; int count; };
2. Use the array-doubling algorithm to increase the size of your array Your array will need to grow to fit the number of words in the file. Start with an array size of 100 , and double the size whenever the array runs out of free space. You will need to allocate your array dynamically and copy values from the old array to the new array. Note: Dont use the built-in std::vector class. This will result in a loss of points. You're actually writing the code that the built-in vector uses behind-the-scenes!
3. Ignore the top 50 most common words that are read in from the second file To get useful information about word frequency, we will be ignoring the 50 most common words in the English language. These words will be read in from a file, whose name is the third command line argument.
4. Take three command line arguments Your program must take three command line arguments - a number N which tells your program how many of the most frequent words to print, the name of the text file to be read and analyzed, and the name of the text file with the words that should be ignored.
5. Output the top N most frequent words Your program should print out the top N most frequent words in the text - not including the common words - where N is passed in as a command line argument.
6. Format your output this way: Array doubled:
7. You must include the following functions (they will be tested by the autograder): a. In your main function i. If the correct number of command line arguments is not passed, print the below statement and exit the program std::cout << "Usage: Assignment2Solution
Ignore words:
the be to of and a in that have i it for not on with he as you do at this but his by from they we say her she or an will my one all would there their what so up out if about who get which go me
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started