Question

1 Approved Answer

Posted on Sep 25, 2024

Example: Your program takes four command-line arguments: the number of most common words to print out, the name of the text file to process, the

Example: Your program takes four command-line arguments: the number of most common words to print out, the name of the text file to process, the stop word list file, and the size of your hash table. Running your program using: ./a.out 10 HW7-HungerGames_edit.txt HW7-stopWords.txt 53 would return the 10 most common words in the file HW7-HungerGames_edit.txt, use a hash table of size 53, and should produce the following results:

682 - is 492 - peeta 479 - its 431 - im 427 - can

414 - says 379 - him 368 - when 367 - no 356 - are # Number of collisions: 7628 # Unique non-stop words: 7681 # Total non-stop words: 59045 Program Specifications The following are requirements for your program: Your Hash Table must be implemented in a class based on the provided HashTable.hpp. You must implement a driver program that utilizes your HashTable class. This driver program will contain your main function. When executing your driver program: o read in the number of most common words to process from the first command-line argument.

o Read in the name of the text file to process from the second command- line argument.

o Read stopwords from the file specified by the third command-line argument. o Create a hash table of size specified by the fourth command-line argument.

Write a member function named getStopwords that takes the name of the stopwords file, fills the data member vector vecIgnoreWords with the stopwords, and returns void. Read in the file for a list of the top 50 most common words to ignore (e.g., Table 1). o The file will have one word per line, and always have exactly 50 words in the file. We will test with files having different words in it! o Your function will update the vector with a list of the words from the file. o You can use your getStopWords function from Homework 2 as a starting point!

Implement a hashing function in member function getHash that will minimize collisions. There are many ways to do this, but for this assignment, we will use a hashing function known as DJB2. Pseudocode for this function is as follows: int hash(string word) int hash = 5381; for each character c in word: hash = hash*33 + c hash = hash % hashTableSize if(hash < 0) hash+=hashTableSize return hash;

A detailed explanation of why this hash function is suitable for hashing strings can be found on the web. It is easy to see, however, why an overly simple hash function, such as summing up the ASCII values of the characters in a string, may not minimize collisions. A simple sum of ASCII values would result in a collision between all words made up of the same characters (cat and act, for example). The DJB2 hash avoids this by applying the multiplication step as it iterates through the characters in the string. However, due to repeated multiplication, it is possible that the hash value will undergo integer overflow. That is the reason for the check to see if the hash value is less than 0. Feel free to experiment with different hash functions and hash table sizes and see what happens to the number of collisions, but use the DJB2 hash for your submission. Store the unique words found in the file that are not in the stopword list in a hash table. o Check if the word is a stopword first, and if it is, then ignore that word. o Use the hashing function to determine where in the hash table the word should be stored. o Search the linked list at that location in the hash table array for the word. If the list does not exist yet, dynamically allocate a new struct, make this new struct the head of the linked list, and add 1 to the number of unique words. If the word is present in the list, add one to the count. If the word is not present, dynamically allocate a new struct and add it to the list Add one to the number of unique words If there was already something in this spot in the hash table, add 1 to the number of collisions.

o This method of using linked lists to deal with collisions is called Separate Chaining with Linked Lists. o Hint: in your main function, after you have read in a word from the text file, this can be easily accomplished by using the isStopWord,

isInTable, incrementCount, and addWord member functions described below.

Before your program ends, be sure to free all dynamically allocated memory via the class destructor. Implement a member function isStopWord that takes a string as an argument and returns true if the string is a stopword, or returns false otherwise. Implement a member function isInTable that takes a string as an argument and returns true if the string is already stored in the hash table, or returns false otherwise. Implement a member function incrementCount that increments the count of a word already stored in the hash table by 1. Write a member function named addWord that creates a new wordItem struct and adds it to the hash table at the appropriate location. Write a member function named searchTable that takes a string as an argument and returns a pointer to the wordItem struct that stores the string, or NULL if the string is not currently stored in the hash table. Implement getter methods for numCollisions and numUniqueWords. Output the top n most frequent words, number of collisions, number of unique non-stop words, and total non-stop words o Write a member function named printTopN that takes the value of N as an argument and determines the top N words in the array. Hint: Declare an array of pointers of size n (static declaration), and use the insertIntoSortedArray algorithm from Assignment 1 to fill this array with words with the largest counts. o Then print out the top N words using the specified output. o Print other output as specified elsewhere in this write-up (number of collisions, number of unique non-stop words, and total number of non-stop words.)

Format your output the following way (and reference the example above). o When you output the top n words in the file, the output needs to be in order, with the most frequent word printed first. The format for the output needs to be:

Count - Word # Number of collisions: # Unique non-stop words: # Total non-stop words:

Generate the output with these commands: in printTopN: cout << wordItem.count << << wordItem.word << endl; in main: cout << # << endl; cout << Number of collisions: << myHashTable.getNumCollisions() << endl; cout << #<

Use the following code for the HashTable class header file(type it in so you get more practice setting up classes):

HashTable.hpp #ifndef HW_7_HASH_TABLE #define HW_7_HASH_TABLE #include #include

// struct to store word + count combinations

struct wordItem { std::string word; int count; wordItem* next; }; const int STOPWORD_LIST_SIZE = 50; class HashTable{ public: HashTable(int hashTableSize); ~HashTable(); void getStopWords(char *ignoreWordFileName); bool isStopWord(std::string word); bool isInTable(std::string word); void incrementCount(std::string word); void addWord(std::string word); int getTotalNumberNonStopWords(); void printTopN(int n); int getNumUniqueWords(); int getNumCollisions(); private: int getHash(std::string word); wordItem* searchTable(std::string word); int numUniqueWords; int numCollisions; int hashTableSize; wordItem** hashTable; std::vector vecIgnoreWords = std::vector(STOPWORD_LIST_SIZE);

}; #endif

Copy your HashTable::addWord() method into the box below, with the provided prototype. You may use any helper functions as needed, but they must be copied into the answer box. You may assume that ALL other class methods are defined and available for you to call. Your answer should look like:

void HashTable::addWord(std::string word) { // Your code here }

Copy your HashTable::getTotalNumberNonStopWords method into the box below, with the provided prototype. You may use any helper functions as needed, but they must be copied into the answer box. You may assume that ALL other class methods are defined and available for you to call. Your answer should look like:

int HashTable::getTotalNumberNonStopWords() { // Your code here }

Copy your HashTable::printTopN method into the box below, with the provided prototype. You may use any helper functions as needed, but they must be copied into the answer box. You may assume that ALL other class methods are defined and available for you to call. Your answer should look like:

void HashTable::printTopN(int n) { // Your code here }

Enter the full contents of your Homework 7 file that contains the main function. The implementations of all class methods have been provided, we are only testing your main.

You may use the following include statements:

#include  #include  #include  #include "HashTable.hpp"

Enter the full contents of your Homework 7 file that contains the implementations for all HashTree methods. This file should NOT contain the main function.

Note that you must implement a correct destructor. We will run your new code against various tests, but we will check to make sure that all dynamically allocated memory is freed after the destructor is called. If you correctly implemented these methods, you will see a message that says