Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You will be implementing a hash map. Maps are associative data structures that pair key and value items together; by knowing the key it is

You will be implementing a hash map. Maps are associative data structures that pair key and value items together; by knowing the key it is possible to find the value item associated with the key. Keys within the map are unique, but values are not; multiple keys can map to the same value. Being a hash-based map, the expected time for most operations is essentially constant. This class is very similar to the Python dict class. Once completed, you will use your hash map to find the number of occurrences of each word in a large document (the script to Romeo and Juliet). This is an important first step for word frequency analysis, an important tool in the field of data mining. Frequency analysis is also used in cryptography to try to break ciphers.

The methods to be filled are listed below:

len

load

contains

getitem

setitem

delitem

clear

iter

keys

word frequency

Your hash map is an associative data structure with both keys and values. Most of the parameters are key objects, but setitem takes both a key and a value object. Each of the parameters makes it clear which parameters are keys and which are values. setitem adds associations to your map while delitem removes associations. contains determines whether an association with the given key exists while getitem determines which value item is associated with a given key. len counts the number of associations (that is the number of key-value pairs); do not count the key and the value separately. clear removes all associations from the map. iter enumerates all of the key-value pairs, not just the keys. Instead, the keys function returns a set of the keys contained in map. The order of the items for both functions is undefined. Hash tables have a limited number of spaces available to store items. Your load function computes the load factor: the average number of items per slot. You are required to keep the load factor below 1 at all times. You may choose to impose a stricter maximum load factor. For your convenience, the constructor has an optional parameter that sets the maximum load factor. You are allowed to change the default value of this parameter. Most of your functions should have an expected amortized constant runtime. The clear, iter, and keys may have a O(n) runtime. You must use O(n) space. The final method, word frequency is not a member of your hash map. Instead, this function takes in a sequence of words and returns a hash map that maps each unique word to the number of times that word appears in the document. You may assume that the sequence has already been sanitized for your convenience; you do not have to worry about removing punctuation or converting cases (you may match the strings as they are given to you). This function should run in O(n) time. You should include comments where appropriate. In particular, you should describe the method used for handling hash collisions. You must also add documentation for each of the above methods (as well as for any helper functions created) in the same style as for the pre-defined methods.

**************************************

Here is the code along with the test cases:

class HashMap:

def __init__(self, load_factor=1.00):

# You may change the default maximum load factor

self.max_load_factor = load_factor

# Other initialization code can go here

def __len__(self):

return 0

def load(self):

return 0.0

def __contains__(self, key):

return False

def __getitem__(self, key):

# Returns value types

raise KeyError(key)

def __setitem__(self, key, value):

pass

def __delitem__(self, key):

raise KeyError(key)

def __iter__(self):

while False:

yield None # Yield key-value pairs

def clear(self):

pass

def keys(self):

return set()

# supplied methods

def __repr__(self):

return '{{{0}}}'.format(','.join('{0}:{1}'.format(k, v) for k, v in self))

def __bool__(self):

return not self.is_empty()

def is_empty(self):

return len(self) == 0

# Helper functions can go here

# Required Function

def word_frequency(seq):

return HashMap()

*****************Test cases below*********************

#!/usr/bin/python3

import unittest, re

from HashMap import HashMap, word_frequency

letters = ['alfa', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf', 'hotel', 'india', 'juliett', 'kilo',

'lima', 'mike', 'november', 'oscar', 'papa', 'quebec', 'romeo', 'sierra', 'tango', 'uniform', 'victor',

'whiskey', 'x-ray', 'yankee', 'zulu']

def sanitize(word):

return re.sub(r'[^\w\s]', '', word).lower()

class HashMapTests(unittest.TestCase):

def test_len(self):

hashmap = HashMap()

for i, l in enumerate(letters):

hashmap[l] = i

self.assertEqual(i + 1, len(hashmap))

for l in letters:

hashmap[l] = len(l)

self.assertEqual(len(letters), len(hashmap))

for i, l in enumerate(letters):

del hashmap[l]

self.assertEqual(len(letters) - i - 1, len(hashmap))

def test_contains(self):

hashmap = HashMap()

for l in letters:

self.assertFalse(l in hashmap)

hashmap[l] = l

self.assertTrue(l in hashmap)

for l in letters:

self.assertTrue(l in hashmap)

del hashmap[l]

self.assertFalse(l in hashmap)

def test_insert(self):

hashmap = HashMap()

for i, l in enumerate(letters):

self.assertFalse(l in hashmap)

hashmap[l] = i

self.assertTrue(l in hashmap)

for l in letters:

self.assertTrue(l in hashmap)

hashmap[l] = len(l)

self.assertTrue(l in hashmap)

def test_get(self):

hashmap = HashMap()

for i, l in enumerate(letters):

self.assertRaises(KeyError, hashmap.__getitem__, l)

hashmap[l] = i

self.assertEqual(i, hashmap[l])

for i, l in enumerate(letters):

self.assertEqual(i, hashmap[l])

hashmap[l] = len(l)

self.assertEqual(len(l), hashmap[l])

for l in letters:

self.assertEqual(len(l), hashmap[l])

del hashmap[l]

self.assertRaises(KeyError, hashmap.__getitem__, l)

def test_delete(self):

hashmap = HashMap()

for l in letters:

self.assertRaises(KeyError, hashmap.__delitem__, l)

for l in letters:

hashmap[l] = l

for i, l in enumerate(letters):

self.assertTrue(l in hashmap)

del hashmap[l]

self.assertFalse(l in hashmap)

self.assertTrue(hashmap.is_empty())

for l in enumerate(letters):

self.assertFalse(l in hashmap)

self.assertRaises(KeyError, hashmap.__delitem__, l)

def test_load(self):

hashmap = HashMap()

self.assertGreaterEqual(1.0, hashmap.max_load_factor)

self.assertAlmostEqual(0, hashmap.load())

for l in letters:

hashmap[l] = l

self.assertLessEqual(0.05, hashmap.load())

self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())

for l in letters:

self.assertLessEqual(0.05, hashmap.load())

self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())

del hashmap[l]

self.assertAlmostEqual(0, hashmap.load())

for l in letters:

hashmap[l] = l

self.assertLessEqual(0.05, hashmap.load())

self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())

load = hashmap.load()

for l in letters:

hashmap[l] = len(l)

self.assertAlmostEqual(load, hashmap.load())

hashmap.clear()

self.assertAlmostEqual(0, hashmap.load())

self.assertGreaterEqual(1.0, hashmap.max_load_factor)

def test_keys(self):

hashmap = HashMap()

for l in letters:

hashmap[l] = l

self.assertEqual(len(letters), len(hashmap.keys()))

self.assertSetEqual(set(letters), hashmap.keys())

def test_clear(self):

hashmap = HashMap()

for l in letters:

hashmap[l] = l

self.assertFalse(hashmap.is_empty())

hashmap.clear()

self.assertTrue(hashmap.is_empty())

self.assertAlmostEqual(0, hashmap.load())

for l in letters:

self.assertFalse(l in hashmap)

self.assertRaises(KeyError, hashmap.__getitem__, l)

self.assertRaises(KeyError, hashmap.__delitem__, l)

def test_iter(self):

hashmap = HashMap()

for l in letters:

hashmap[l] = l

expected = {l:l for l in letters}

actual = {k:v for k, v in hashmap}

self.assertDictEqual(expected, actual)

def test_word_frequency(self):

with open('romeojuliet.txt', 'r') as reader:

freq = word_frequency(sanitize(word) for line in reader for word in line.split())

self.assertEqual(3741, len(freq))

self.assertEqual(293, freq['romeo'])

self.assertEqual(176, freq['juliet'])

def test_large_size(self):

hashmap = HashMap()

for i in range(10000):

self.assertFalse(i in hashmap)

hashmap[i] = i

self.assertTrue(i in hashmap)

self.assertEqual(i, hashmap[i])

self.assertLessEqual(0.05, hashmap.load())

self.assertGreaterEqual(hashmap.max_load_factor, hashmap.load())

load = hashmap.load()

for i in range(10000):

self.assertTrue(i in hashmap)

self.assertEqual(i, hashmap[i])

hashmap[i] = i * i

self.assertTrue(i in hashmap)

self.assertEqual(i * i, hashmap[i])

self.assertAlmostEqual(load, hashmap.load())

for i in range(10000):

self.assertTrue(i in hashmap)

self.assertEqual(i * i, hashmap[i])

del hashmap[i]

self.assertFalse(i in hashmap)

self.assertAlmostEqual(0, hashmap.load())

def test_sequence(self):

pass

if __name__ == '__main__':

unittest.main()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2017 Skopje Macedonia September 18 22 2017 Proceedings Part 3 Lnai 10536

Authors: Yasemin Altun ,Kamalika Das ,Taneli Mielikainen ,Donato Malerba ,Jerzy Stefanowski ,Jesse Read ,Marinka Zitnik ,Michelangelo Ceci ,Saso Dzeroski

1st Edition

ISBN: 3319712721, 978-3319712727

More Books

Students also viewed these Databases questions