Question
You will be implementing a hash map. Maps are associative data structures that pair key and value items together; by knowing the key it is
You will be implementing a hash map. Maps are associative data structures that pair key and value items together; by knowing the key it is possible to find the value item associated with the key. Keys within the map are unique, but values are not; multiple keys can map to the same value. Being a hash-based map, the expected time for most operations is essentially constant. This class is very similar to the Python dict class. Once completed, you will use your hash map to find the number of occurrences of each word in a large document (the script to Romeo and Juliet). This is an important first step for word frequency analysis, an important tool in the field of data mining. Frequency analysis is also used in cryptography to try to break ciphers.
The methods to be filled are listed below:
len
load
contains
getitem
setitem
delitem
clear
iter
keys
word frequency
Your hash map is an associative data structure with both keys and values. Most of the parameters are key objects, but setitem takes both a key and a value object. Each of the parameters makes it clear which parameters are keys and which are values. setitem adds associations to your map while delitem removes associations. contains determines whether an association with the given key exists while getitem determines which value item is associated with a given key. len counts the number of associations (that is the number of key-value pairs); do not count the key and the value separately. clear removes all associations from the map. iter enumerates all of the key-value pairs, not just the keys. Instead, the keys function returns a set of the keys contained in map. The order of the items for both functions is undefined. Hash tables have a limited number of spaces available to store items. Your load function computes the load factor: the average number of items per slot. You are required to keep the load factor below 1 at all times. You may choose to impose a stricter maximum load factor. For your convenience, the constructor has an optional parameter that sets the maximum load factor. You are allowed to change the default value of this parameter. Most of your functions should have an expected amortized constant runtime. The clear, iter, and keys may have a O(n) runtime. You must use O(n) space. The final method, word frequency is not a member of your hash map. Instead, this function takes in a sequence of words and returns a hash map that maps each unique word to the number of times that word appears in the document. You may assume that the sequence has already been sanitized for your convenience; you do not have to worry about removing punctuation or converting cases (you may match the strings as they are given to you). This function should run in O(n) time. You should include comments where appropriate. In particular, you should describe the method used for handling hash collisions. You must also add documentation for each of the above methods (as well as for any helper functions created) in the same style as for the pre-defined methods.
**************************************
Here is the code along with the test cases:
class HashMap:
def __init__(self, load_factor=1.00):
# You may change the default maximum load factor
self.max_load_factor = load_factor
# Other initialization code can go here
def __len__(self):
return 0
def load(self):
return 0.0
def __contains__(self, key):
return False
def __getitem__(self, key):
# Returns value types
raise KeyError(key)
def __setitem__(self, key, value):
pass
def __delitem__(self, key):
raise KeyError(key)
def __iter__(self):
while False:
yield None # Yield key-value pairs
def clear(self):
pass
def keys(self):
return set()
# supplied methods
def __repr__(self):
return '{{{0}}}'.format(','.join('{0}:{1}'.format(k, v) for k, v in self))
def __bool__(self):
return not self.is_empty()
def is_empty(self):
return len(self) == 0
# Helper functions can go here
# Required Function
def word_frequency(seq):
return HashMap()
*****************Test cases below*********************
#!/usr/bin/python3
import unittest, re
from HashMap import HashMap, word_frequency
letters = ['alfa', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf', 'hotel', 'india', 'juliett', 'kilo',
'lima', 'mike', 'november', 'oscar', 'papa', 'quebec', 'romeo', 'sierra', 'tango', 'uniform', 'victor',
'whiskey', 'x-ray', 'yankee', 'zulu']
def sanitize(word):
return re.sub(r'[^\w\s]', '', word).lower()
class HashMapTests(unittest.TestCase):
def test_len(self):
hashmap = HashMap()
for i, l in enumerate(letters):
hashmap[l] = i
self.assertEqual(i + 1, len(hashmap))
for l in letters:
hashmap[l] = len(l)
self.assertEqual(len(letters), len(hashmap))
for i, l in enumerate(letters):
del hashmap[l]
self.assertEqual(len(letters) - i - 1, len(hashmap))
def test_contains(self):
hashmap = HashMap()
for l in letters:
self.assertFalse(l in hashmap)
hashmap[l] = l
self.assertTrue(l in hashmap)
for l in letters:
self.assertTrue(l in hashmap)
del hashmap[l]
self.assertFalse(l in hashmap)
def test_insert(self):
hashmap = HashMap()
for i, l in enumerate(letters):
self.assertFalse(l in hashmap)
hashmap[l] = i
self.assertTrue(l in hashmap)
for l in letters:
self.assertTrue(l in hashmap)
hashmap[l] = len(l)
self.assertTrue(l in hashmap)
def test_get(self):
hashmap = HashMap()
for i, l in enumerate(letters):
self.assertRaises(KeyError, hashmap.__getitem__, l)
hashmap[l] = i
self.assertEqual(i, hashmap[l])
for i, l in enumerate(letters):
self.assertEqual(i, hashmap[l])
hashmap[l] = len(l)
self.assertEqual(len(l), hashmap[l])
for l in letters:
self.assertEqual(len(l), hashmap[l])
del hashmap[l]
self.assertRaises(KeyError, hashmap.__getitem__, l)
def test_delete(self):
hashmap = HashMap()
for l in letters:
self.assertRaises(KeyError, hashmap.__delitem__, l)
for l in letters:
hashmap[l] = l
for i, l in enumerate(letters):
self.assertTrue(l in hashmap)
del hashmap[l]
self.assertFalse(l in hashmap)
self.assertTrue(hashmap.is_empty())
for l in enumerate(letters):
self.assertFalse(l in hashmap)
self.assertRaises(KeyError, hashmap.__delitem__, l)
def test_load(self):
hashmap = HashMap()
self.assertGreaterEqual(1.0, hashmap.max_load_factor)
self.assertAlmostEqual(0, hashmap.load())
for l in letters:
hashmap[l] = l
self.assertLessEqual(0.05, hashmap.load())
self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())
for l in letters:
self.assertLessEqual(0.05, hashmap.load())
self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())
del hashmap[l]
self.assertAlmostEqual(0, hashmap.load())
for l in letters:
hashmap[l] = l
self.assertLessEqual(0.05, hashmap.load())
self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())
load = hashmap.load()
for l in letters:
hashmap[l] = len(l)
self.assertAlmostEqual(load, hashmap.load())
hashmap.clear()
self.assertAlmostEqual(0, hashmap.load())
self.assertGreaterEqual(1.0, hashmap.max_load_factor)
def test_keys(self):
hashmap = HashMap()
for l in letters:
hashmap[l] = l
self.assertEqual(len(letters), len(hashmap.keys()))
self.assertSetEqual(set(letters), hashmap.keys())
def test_clear(self):
hashmap = HashMap()
for l in letters:
hashmap[l] = l
self.assertFalse(hashmap.is_empty())
hashmap.clear()
self.assertTrue(hashmap.is_empty())
self.assertAlmostEqual(0, hashmap.load())
for l in letters:
self.assertFalse(l in hashmap)
self.assertRaises(KeyError, hashmap.__getitem__, l)
self.assertRaises(KeyError, hashmap.__delitem__, l)
def test_iter(self):
hashmap = HashMap()
for l in letters:
hashmap[l] = l
expected = {l:l for l in letters}
actual = {k:v for k, v in hashmap}
self.assertDictEqual(expected, actual)
def test_word_frequency(self):
with open('romeojuliet.txt', 'r') as reader:
freq = word_frequency(sanitize(word) for line in reader for word in line.split())
self.assertEqual(3741, len(freq))
self.assertEqual(293, freq['romeo'])
self.assertEqual(176, freq['juliet'])
def test_large_size(self):
hashmap = HashMap()
for i in range(10000):
self.assertFalse(i in hashmap)
hashmap[i] = i
self.assertTrue(i in hashmap)
self.assertEqual(i, hashmap[i])
self.assertLessEqual(0.05, hashmap.load())
self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())
load = hashmap.load()
for i in range(10000):
self.assertTrue(i in hashmap)
self.assertEqual(i, hashmap[i])
hashmap[i] = i * i
self.assertTrue(i in hashmap)
self.assertEqual(i * i, hashmap[i])
self.assertAlmostEqual(load, hashmap.load())
for i in range(10000):
self.assertTrue(i in hashmap)
self.assertEqual(i * i, hashmap[i])
del hashmap[i]
self.assertFalse(i in hashmap)
self.assertAlmostEqual(0, hashmap.load())
def test_sequence(self):
pass
if __name__ == '__main__':
unittest.main()
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started