Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

There is one test file on the website HW2-HungerGames_edit.txt that contain the full text from Hunger Games Book 1. We have pre-processed the file to

There is one test file on the website HW2-HungerGames_edit.txt that contain the full text from Hunger Games Book 1. We have pre-processed the file to remove all punctuation and down-cased all words. We will test on a different file! There is also the ignore words file HW2-ignoreWords.txt that contain the top 50 common words usually ignored during natural-language processing. Your program will calculate the following information on any text file:

The top n words (excluding stop words; n is also a command-line argument) and the number of times each word was found The total number of unique words (excluding stop words) in the file The total number of words (excluding stop words) in the file The number of array doublings needed to store all unique words in the file Example: Your program takes three command-line arguments: the number of most common words to print out, the name of the file to process, and the stop word list file. Running your program using: ./a.out 10 HW2-HungerGames_edit.txt HW2-ignoreWords.txt would return the 10 most common words in the file HW2-HungerGames_edit.txt and should produce the following results: 682 - is 492 - peeta 479 - its 431 - im 427 - can 414 - says 379 - him 368 - when 367 - no 356 - are # Array doubled: 7 # Unique non-stop words: 7682 # Total non-stop words: 59157 Program Specifications The following are requirements for your program: Read in the name of the file to process from the second command-line argument. Read in the number of most common words to process from the first command-line argument. Write a function named getStopwords that takes the name of the ignorewords file and a reference to a vector as parameters (returns void). Read in the file for a list of the top 50 most common words to ignore (e.g., Table 1). These are commonly referred to as stopwords in NLP (Natural Language Processing). (Create this file yourself) o The file will have one word per line, and always have exactly 50 words in the file. We will test with files having different words in it! o Your function will update the vector passed to it with a list of the words from the file. Store the unique words found in the file that are not in the stopword list in a dynamically allocated array. o Call a function to check if the word is a stopword first, and if it is, then ignore that word. o Use an array of structs to store each unique word (variable name word) and a count (variable name count) of how many times it appears in the text file. o Use the array-doubling algorithm to increase the size of your array

We dont know ahead of time how many unique words the input file will have, so you dont know how big the array should be. Start with an array size of 100 (use the constant declared in the starter code), and double the size as words are read in from the file and the array fills up with new words. Use dynamic memory allocation to create your array Copy the values from the current array into the new array, and then Free the memory used for the current array. (Index of any given word in the array after resizing must match index in array before resizing.)

void getStopWords(char *ignoreWordFileName, vector& _vecIgnoreWords) { // Your code here } 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2022 Grenoble France September 19 23 2022 Proceedings Part 4 Lnai 13716

Authors: Massih-Reza Amini ,Stephane Canu ,Asja Fischer ,Tias Guns ,Petra Kralj Novak ,Grigorios Tsoumakas

1st Edition

3031264118, 978-3031264115

More Books

Students also viewed these Databases questions

Question

Select suitable tools to analyze service problems.

Answered: 1 week ago