Question

1 Approved Answer

Posted on Sep 26, 2024

IN C++ the be to of and a in that have i it for not on with he as you do at this but his

IN C++ image text in transcribed

the be to of and a in that have i it for not on with he as you do at this but his by from they we say her she or an will my one all would there their what so up out if about who get which go me

ignoreWords.txt ^

let me know if you need mobydick.txt

I just really need the

void printTenFromN function that is correct

image text in transcribed

Instructions In this assignment, we will write a program to analyze the word frequency of a document. As the number of words in the document may not be known a priori, we will implement a dynamically doubling array to store the information. Please read all the directions before writing code, as this write-up contains specific requirements for how the code must be written. Problem Overview: There are two files on Canvas. mobydick.txt - contains text to be read and analyzed by your program. The file contains the full text from Moby Dick. For your convenience, all the punctuation has been removed, words have been converted to lowercase, and the entire document can be read as if it were written on a single line. ignoreWords.txt contains the 50 of the most common words in the English language, which your program will ignore during analysis. Your program must take three command line arguments in the following order - a number N, the file name of the text to be read, and the file name containing the words to be ignored. It will read the text from the first file while ignoring the words from the second file and store all the unique words encountered in a dynamically doubling array. After necessary calculation, the program must print the following information: O . . The number of times array doubling was required to store all the unique words The number of unique non-ignore" words in the file The total word count of the file (excluding the ignore words) After calculating the probability of occurrence of each word and storing it in an array in the decreasing order of probability, starting from index N of the array, print the 10 most frequent words along with their probability (up to 5 decimal places) . For example, running your program with the command: ./Assignment2 25 mobydick.txt ignorewords.txt would print the next 10 words starting from index 25, i.e. your program must print the 25th-34th most frequent words, along with their respective probabilities. Keep in mind that these words must not be any of the words from ignoreWords.txt. The full results would be: Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 0.00302 - other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290 - sea 0.00285 - said 0.00282 - down 0.00276 - yet 0.00275 - any 0.00270 - whales Specifications: 1. Use an array of structs to store the words and their counts You will store each unique word and its count (the number of times it occurs in the document) in an array of structs. As the number of unique words is not known ahead of time, the array of structs must be dynamically sized. The struct must be defined as follows: struct wordRecord { string word; int count; }; 2. Use the array-doubling algorithm to increase the size of your array Your array will need to grow to fit the number of words in the file. Start with an array size of 100, and double the size whenever the array runs out of free space. You will need to allocate your array dynamically and copy values from the old array to the new array. (Array-doubling algorithm must be implemented in the main() function). Note: Don't use the built-in std::vector class. This will result in a loss of points. You're actually writing the code that the built-in vector uses behind-the-scenes! 3. Ignore the top 50 most common words that are read from the ignoreWords.txt file To get useful information about word frequency, we will be ignoring the 50 most common words in the English language as noted in ignoreWords.txt 4. Take three command line arguments Your program must take three command line arguments 1. a number N which tells your program the starting index to print the next 10 most frequent words 2. the name of the text file to be read and analyzed 3. The name of the text file with the words to be ignored. 5. Output the Next 10 most frequent words starting from index N Your program must print out the next 10 most frequent words - not including the common words - starting index N in the text where N is passed as a command line argument. If two words have the same frequency, list them alphabetically. 6. Format your final output this way: Array doubled: Distinct non-common words: Total non-common words: Probability of next 10 words from rank - - - For example, using the command: ./Assignment2 25 mobydick.txt ignorewords.txt Output: Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 0.00302 - other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290 - sea 0.00285 - said 0.00282 - down 0.00276 - yet 0.00275 - any 0.00270 - whales 7. You must include the following functions (they will be tested by the autograder): a. main function i. If the correct number of command line arguments is not passed, print the below statement and exit the program std::cout " Distinct non-common words: Total non-common words: Probability of next 10 words from rank - - - For example, using the command: ./Assignment2 25 mobydick.txt ignorewords.txt Output: Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 0.00302 - other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290 - sea 0.00285 - said 0.00282 - down 0.00276 - yet 0.00275 - any 0.00270 - whales 7. You must include the following functions (they will be tested by the autograder): a. main function i. If the correct number of command line arguments is not passed, print the below statement and exit the program std::cout " <><><><><><>