Question

1 Approved Answer

Posted on Sep 21, 2024

Write in JAVA Hashtables Introduction: Suppose we are inserting n keys into a hash table of size m. Then the load factor is defined to

Write in JAVA

Hashtables

Introduction: Suppose we are inserting n keys into a hash table of size m. Then the load factor is defined to be n/m. For open addressing n m, which implies that 0 1. In this assignment we will study how the load factor affects the average number of probes required by open addressing while using linear probing and double hashing.

Design: Implement a hash table to be an array of HashObject. A HashObject contains a generic object and a frequency count. The HashObject needs to override both the equals and the toString methods and should also have a getKey method. We will use linear probing and double hashing. So design the HashTable class to have an indicator parameter in the constructor to set the type of probing will be performed. The HashTable defaults to linear probing. Choose a value of the table size m to be a prime in the range [95500 . . . 96000]. A good value is to use a prime that is 2 away from another prime. That is, both m and m 2 are primes. Two primes (differ by two) are called twin primes. Please find the table size using the smallest twin primes in the given grange [95500 . . . 96000]. Vary the load factor as 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99 by setting the value of n appropriately, that is, n = m. Keep track of the average number of probes required for each value of for linear probing and for double hashing. For the double hashing, the primary hash function is h1(k) = k mod m and the secondary hash function is h2(k) = 1 + (k mod (m 2)). There are three sources of data for this experiment as described in the next section. Note that the data can contain duplicates. If a duplicate is detected, then update the frequency for the object rather than inserting it again. Keep inserting elements until you have reached the desired load factor. Count the number of probes only for new insertions and not when you found a duplicate.

Experiment: For the experiment we will consider three different sources of data as follows. You will need to insert HashObjects until the pre-specified is reached, where Data Source 1: each HashObject contains an Integer object with a random int value generated by the method nextInt() in java.util.Random class. The key for each such HashObject is the Integer object inside. Data Source 2: each HashObject contains a Long object with a long value generated by the method System.currentTimeMillis(). The key for each such HashObject is the Long object inside. Data Source 3: each HashObject contains a word from the file word-list The file contains 3,037,798 words (one per line) out of which 101,233 are unique. The key for each such HashObject is the word inside. When you hash a HashObject into a table index, you will need to Compute the hashCode() of the key of the HashObject. Use the hashCode() to perform the linear probing or double hashing calculation. Note that hashCode() can return negative integers. You need to ensure that the mod operation in the probing calculation always returns positive integers. Note that two different objects (key objects) may have the same hashCode() value. Thus, you must compare the actual key objects to check if the HashObject to be inserted is a duplicate.

Required file/class names and output: The source code for the project. The driver program should be named as HashTest, it should have three (the third one is optional) command-line arguments as follows: java HashTest [] The should be 1, 2, or 3 depending on whether the data is generated using java.util.Random, System.currentTimeMillis() or from the file word-list. The program should print out the input source type, total number of keys inserted into the hash table and the average number of probes required for linear probing and double hashing. The optional argument specifies a debug level with the following meaning: debug = 0 print summary of experiment on the console debug = 1 print summary of experiment on the console and also print the hash tables with number of duplicates and number of probes into two files linear-dump and double-dump. For debug level of 0, the output is a summary. An example is shown below. [jhyeh@onyx sol]$ java HashTest 3 0.5 A good table size is found: 95791 Data source type: word-list Using Linear Hashing.... Input 1305930 elements, of which 1258034 duplicates load factor = 0.5, Avg. no. of probes 1.5969183230332387 Using Double Hashing.... Input 1305930 elements, of which 1258034 duplicates load factor = 0.5, Avg. no. of probes 1.3926841489894772 For debug level of 1, the table should be output in the following format. table[0]: weeping 3 table[1]: enfetterd 0 table[2]: atherton 1 table[4]: whateer 25 table[8]: cried 89 table[9]: angel 1 table[11]: mansfield 4 table[12]: logothete 0 table[16]: episode 4 table[17]: lind 4 table[19]: scratching 3 table[21]: cups 8 Note that empty entries of the table are omitted in the output.

Submission

The full program with all the necessary files. A readme file that contains tables showing the average number of probes versus load factors. There should be three tables for the three different sources of data. Each table should have eight rows (for different ) and two columns (for linear probing and double hashing). A sample result containing three tables can be seen in the sample result.txt file Before submission, you need to make sure that your program can be compiled and run.

Sample Result:

Input source 1: random number alpha linear double ----------------------------- 0.5 1.50 1.39 0.6 1.77 1.53 0.7 2.20 1.72 0.8 2.97 2.01 0.9 5.40 2.55 0.95 9.55 3.15 0.98 24.51 4.00 0.99 33.35 4.67 Input6 source 2: current time alpha linear double ----------------------------- 0.5 1.0 1.0 0.6 1.0 1.0 0.7 1.0 1.0 0.8 1.0 1.0 0.9 1.0 1.0 0.95 1.0 1.0 0.98 1.0 1.0 0.99 1.0 1.0 Input source 3: word-list alpha linear double ----------------------------- 0.5 1.59 1.40 0.6 2.08 1.54 0.7 3.76 1.72 0.8 7.43 2.02 0.9 22.37 2.57 0.95 155.21 3.15 0.98 375.59 3.96 0.99 524.39 4.61