Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Write code using skeleton, c++ #ifndef HW_7_HASH_TABLE #define HW_7_HASH_TABLE #include // struct to store word + count combinations struct wordItem { std::string word; int count;

Write code using skeleton, c++

#ifndef HW_7_HASH_TABLE #define HW_7_HASH_TABLE

#include

// struct to store word + count combinations struct wordItem { std::string word; int count; wordItem* next; };

/* class HashTable for storing words. * You will create two hash tables in your driver: * - one for storing stop words * - one for storing unique non-stop words * Why? We can load all the stopwords into the first table. * Then, we can quickly check that first hash table to see if * a word is a stopword before adding it to the second. */ class HashTable { public: HashTable(int hashTableSize); ~HashTable(); void addWord(std::string word); bool isInTable(std::string word); void incrementCount(std::string word); void printTopN(int n); int getNumCollisions(); int getNumItems(); int getTotalNumWords();

private: /* member functions */ unsigned int getHash(std::string word); wordItem* searchTable(std::string word);

/* instance variables */ wordItem** hashTable; int hashTableSize; int numItems; int numCollisions; };

/* size to make your stopwords hash table */ const int STOPWORD_TABLE_SIZE = 50;

/* Required functions for use in main. * you are free to also define your own helper functions * for your driver in your .cpp files if you wish. * These are the same functions from hw2, but use a * hashtable instead of a vector. */

/* load stopwords into the stopwords hash table */ void getStopWords(char *ignoreWordFileName, HashTable &stopWordsTable); /* check table to see if a word is a stopword or not */ bool isStopWord(std::string word, HashTable &stopWordsTable);

#endif

Here are the instructions for the code

-----------------------------

You will create two instances of your HashTable for use in your driver function.

HashTable stopWordsTable: store/look up stop words

HashTable uniqueWordsTable: store/look up unique non-stop words.

When executing your driver program:

Read in the number of most common words to process from the first command-line argument.

Read in the name of the text file to process from the second command-line argument.

Create a hash table stopWordsTable of size STOPWORD_TABLE_SIZE to store the stop words in. Populate the hash table with the stopwords read in from the third command-line argument, using the getStopWords function.

Create a second hash table uniqueWordsTable of size specified by the fourth command-line argument. This hash table will store and count the words from the text file which are not stop-words.

Store the unique words (excluding stop words) found in the file in the uniqueWordsTable hash table.

Check if the word is a stopword first, and if it is, then ignore that word.

Check if the word is in the table or not (hint: use isInTable method).

If it is, add one to the count (hint: use the incrementCount method).

If it is not present, add it to the table. Count the number of collisions. (hint: use the addWord method; collision counting should be done in this method as well)

Before your program ends, be sure to free all dynamically allocated memory via the class destructor.

Output the top n most frequent words, number of collisions, number of unique non-stop words, and total non-stop words using the specified output (see bottom of document for code, and above for example).

Dealing with Stop Words in your driver:

Write a function named getStopWords that takes the name of the stopwords file and a hash table to store the stopwords, fills the hash table with the stopwords, and returns void. Read in the file for a list of the top 50 most common words to ignore.

Stopwords should be saved into stopWordsTable (one of two hash tables you should create in your driver).

The file will have one word per line. We will test with files having different words in it!

This is similar to the getStopWords function from Homework 2, but uses a hash table instead of a vector!

Write a function called isStopWord that checks the stopWordsTable hash table, and returns True if the word is in the table or False if it isnt.

Hash Table Specifications: You will implement a hash table data structure for storing wordItems, using the HashTable class definition we have provided. You will create two objects using this class: one to for storing the stopwords and checking if a word is a stopword, and one to store and count the unique words (excluding stop words).

getHash method:

Implement a hashing function in HashTables member function getHash that will minimize collisions. There are many ways to do this, but for this assignment, we will use a hashing function known as DJB2. Pseudocode for this function is as follows:

unsigned int getHash(string word)

unsigned int hash = 5381

for each character c in word:

hash = hash*33 + c

hash = hash % hashTableSize

return hash

A detailed explanation of why this hash function is suitable for hashing strings can be found on the web. It is easy to see, however, why an overly simple hash function, such as summing up the ASCII values of the characters in a string, may not minimize collisions. A simple sum of ASCII values would result in a collision between all words made up of the same characters (cat and act, for example). The DJB2 hash avoids this by applying the multiplication step as it iterates through the characters in the string. However, due to repeated multiplication, it is possible that the hash value will undergo integer overflow. That is the reason we need to use an unsigned int. Feel free to experiment with different hash functions and hash table sizes and see what happens to the number of collisions, but use the DJB2 hash for your submission.

addWord method:

Use the hashing function (getHash) to determine where in the hash table the word should be stored.

If the list at that spot is empty: dynamically allocate a new wordItem struct, make this new struct the head of the linked list, and add 1 to the number of unique words.

If the list already has items in it: this is a collision! Dynamically allocate a new struct and add it at the head of the linked list. Increment the number of collisions.

This method of using linked lists to deal with collisions is called Separate Chaining with Linked Lists.

isInTable method:

Implement a member function isInTable that takes a string as an argument and returns true if the string is already stored in the hash table, or returns false otherwise.

incrementCount method:

Implement a member function incrementCount that increments the count of a word already stored in the hash table by 1.

addWord method:

Write a member function named addWord that creates a new wordItem struct and adds it to the hash table at the appropriate location.

searchTable method:

Write a member function named searchTable that takes a string as an argument and returns a pointer to the wordItem struct that stores the string, or nullptr if the string is not currently stored in the hash table.

getTotalNumWords method:

Write a member function that sums up the count of each word for each word in the hash table.

getter methods:

Implement getter methods for numCollisions and numItems.

printTopN method:

Write a member function named printTopN that takes the value of N as an argument and determines and prints the top N words in the array. Hint: Declare an array of pointers of size n (static declaration), and use the insertIntoSortedArray algorithm from Assignment 1 to fill this array with words with the largest counts.

Format your output the following way: when you output the top n words in the file, the output needs to be in order, with the most frequent word printed first (see code below and reference the example above).