Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Introduction : Suppose we are inserting n keys into a hash table of size m. Then the load factor is defined to be n/m. For

Introduction:

Suppose we are inserting n keys into a hash table of size m. Then the load factor is defined to be n/m. For open addressing n m, which implies that 0 1. In this assignment we will study how the load factor affects the average number of probes required by open addressing while using linear probing and double hashing.

Design:

Implement a hash table to be an array of HashObject. A HashObject contains a generic object and a frequency count. The HashObject needs to override both the equals and the toString methods and should also have a getKey method. We will use linear probing and double hashing. So design the HashTable class to have an indicator parameter in the constructor to set the type of probing will be performed. The HashTable defaults to linear probing. Choose a value of the table size m to be a prime in the range [95500 . . . 96000]. A good value is to use a prime that is 2 away from another prime. That is, both m and m 2 are primes. Two primes (differ by two) are called twin primes. Please find the table size using the smallest twin primes in the given grange [95500 . . . 96000]. Vary the load factor as 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99 by setting the value of n appropriately, that is, n = m. Keep track of the average number of probes required for each value of for linear probing and for double hashing. For the double hashing, the primary hash function is h1(k) = k mod m and the secondary hash function is h2(k) = 1 + (k mod (m 2)). There are three sources of data for this experiment as described in the next section. Note that the data can contain duplicates. If a duplicate is detected, then update the frequency for the object rather than inserting it again. Keep inserting elements until you have reached the desired load factor. Count the number of probes only for new insertions and not when you found a duplicate.

Experiment:

For the experiment we will consider three different sources of data as follows. You will need to insert HashObjects until the pre-specified is reached, where

Data Source 1: each HashObject contains an Integer object with a random int value generated by the method nextInt() in java.util.Random class. The key for each such HashObject is the Integer object inside.

Data Source 2: each HashObject contains a Long object with a long value generated by the method System.currentTimeMillis(). The key for each such HashObject is the Long object inside.

Data Source 3: each HashObject contains a word from the file word-list The file contains 3,037,798 words (one per line) out of which 101,233 are unique. The key for each such HashObject is the word inside.

When you hash a HashObject into a table index, you will need to

Compute the hashCode() of the key of the HashObject.

Use the hashCode() to perform the linear probing or double hashing calculation. Note that hashCode() can return negative integers. You need to ensure that the mod operation in the probing calculation always returns positive integers.

Note that two different objects (key objects) may have the same hashCode() value. Thus, you must compare the actual key objects to check if the HashObject to be inserted is a duplicate.

Required file/class names and output:

The source code for the project. The driver program should be named as HashTest, it should have three (the third one is optional) command-line arguments as follows: java HashTest [] The should be 1, 2, or 3 depending on whether the data is generated using java.util.Random, System.currentTimeMillis() or from the file word-list. The program should print out the input source type, total number of keys inserted into the hash table and the average number of probes required for linear probing and double hashing. The optional argument specifies a debug level with the following meaning: debug = 0 print summary of experiment on the console debug = 1 print summary of experiment on the console and also print the hash tables with number of duplicates and number of probes into two files linear-dump and double-dump. For debug level of 0, the output is a summary. An example is shown below.

[jhyeh@onyx sol]$ java HashTest 3 0.5

A good table size is found: 95791

Data source type: word-list

Using Linear Hashing....

Input 1305930 elements, of which 1258034 duplicates

load factor = 0.5, Avg. no. of probes 1.5969183230332387

Using Double Hashing.... Input 1305930 elements, of which 1258034 duplicates

load factor = 0.5, Avg. no. of probes 1.3926841489894772

For debug level of 1, the table should be output in the following format.

table[0]: weeping 3

table[1]: enfetterd 0

table[2]: atherton 1

table[4]: whateer 25

table[8]: cried 89

table[9]: angel 1

table[11]: mansfield 4

table[12]: logothete 0

table[16]: episode 4

table[17]: lind 4

table[19]: scratching 3

table[21]: cups 8

Note that empty entries of the table are omitted in the output

* We are given a word-list text file to parse the individual words in for Data Source #3.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Climate And Environmental Database Systems

Authors: Michael Lautenschlager ,Manfred Reinke

1st Edition

1461368332, 978-1461368335

More Books

Students also viewed these Databases questions