Question: Implement an efficient algorithm to calculate the word frequencies of a text file using a hash table. The hash table is useful for storing the

Implement an efficient algorithm to calculate the word frequencies of a text file using a hash table. The hash table is useful for storing the word and the frequency meter. The procedure to follow is as follows:

When launching your application, the program should dynamically create a constant-size hash table. Choose what size it should be (it should be prime number and relatively large because the purpose is to store words from large files). It will then read a text file named 'data.txt' containing all the text for which we want to calculate the frequency of the words. Choose your own way of reading the text file and you will separate the words. Note that reading words from the text may require some editing of the word to remove the last symbol if there is a special character (".", "!", "," Etc.). Each time a word is read from the file it should be added to the hash table using the word k as the key. To convert it key from string to integer you can produce a sum of the ascii value corresponding to each character. Alternatively you could use an algorithm of your choice. For the hash function use h (k) = k mod m where m is the size of the hash table. Conflicts: You need to implement 2 different versions of the program. In the first version the conflicts will be resolved by double fragmentation and in the 2nd version will be resolved with single link lists. 1. Double Fragmentation Conflict Resolution: This technique uses a second transformation function to identify the next available position. That is, let us assume that the function h1 (k) = kmodm is used to find the original storage location. In the event of a collision another function is used to give the distance to positions from the original position. In the event of a new collision, a test shall be carried out at a position equal to that of the second collision, and so on. A common example that you should use for re-fragmentation is: h2 (k) = (k / m) mod m ,That is, in this second function, the key of the key is first divided by the length of the table, then the remainder is calculated and added to the result of h1 (k) to obtain the next position search. If the calculated quotient is equal to 0, then it is equal to 1. 2. Uni-List Conflict Resolution: Every new key added to a list due to a conflict with previous keys should be placed in the appropriate list to keep it sorted and be faster. searching for her. If the word is already stored in the hash table then it should simply increase the word's frequency counter by one. Expand your code to: Count in both versions how many collisions occur when importing data into the hash table. Calculate what is the total time required to save a word file to the hash table. Time will be measured for both implementations. Try running your code for different text file sizes (you can download larger files online). Show in your code comments the times you calculated and the corresponding file size. You can measure the execution time of a code segment as follows: All the basic functions of the application should be implemented using functions. Functions that must necessarily be implemented are: GetKey (): Returns an integer that corresponds to the string it accepts as parameter. Hash1 (): Implementation of the basic hash function. Hash2 (): Implementation of the 2nd hash function. Used in the art of double fragmentation. Insert (): Inserts a string (each word in the text of the file) in the hash structure. It first calculates an integer key by calling getKey () and then calculates the location corresponding to the hash () key. Resolves conflicts if any. Print (): Displays the stored data (Word and display frequency) for each location of the hash table. It is advisable to call this function to display the table as it is configured after reading the file. PrintUnique (): Displays all words in the text that are unique. The display data should be read from the configured hash table.

The program has to be in programming laguage C, thanks in advance for your time.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!