Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1 . ( 5 pts ) Implement in Python a class to model hash functions from a certain family H of hash functions. ( Call

1.
(
5
pts
)
Implement in Python a class to model hash functions from a certain family H of hash functions.
(
Call this class HashFamily.
)
The constructor for H should take n and p as arguments. The constructor should generate a random set of p distinct indices from
{
0
,
1
,
.
.
.
,
n
-
1
}
and remember this set, call it S
.
This class should have a method hash
(
x
)
which computes the hash value of any n
-
bit number x as follows. The method first extracts
{
x
[
i
]
: i in S
}
.
It then formulates the p
-
bit number obtained from this set in order of increasing index and returns this p
-
bit number in decimal.
2.
(
5
pts
)
Implement in Python a Bloom Filter
(
BF
)
as a class.
1.
The constructor should take m
,
k
,
and n as arguments, where m and k are as done in class and n is the number of bits in an input.
(
Its okay to limit m to be a power of
2.
)
The constructor should then create k random hash functions as HashFamily objects with arguments n
=
n and p
=
log
_
2
(
m
)
.
It maintains these hash functions in its state.
2.
Support insert and lookup as done in class.
3.
Support a bash insert method
-
a wrapper around insert
-
as a convenience method
(
see next question
)
.
3.
(
7
pts
)
In this question, the aim is to empirically study how various parameter settings impact the false positive rate on simulated data.
1.
Generate
10000
64
-
bit numbers at random and store this list somewhere. Call it S
.
2.
for m in
[
64
,
128
,
256
,
512
,
1024
,
2048
,
.
.
.
,
65536
]
:
1.
for k in
[
1
,
2
,
4
,
8
,
.
.
.
,
m
/
2
]
:
1.
Create a BF with params m
=
m
,
k
=
k
,
n
=
64.
2.
for q in
[
m
,
2
m
,
4
m
,
8
m
,
16
m
,
32
m
,
64
m
]
1.
Generate q n
-
bit numbers and batch insert them all into the BF
.
(
Note that these are inserted into an existing BF
.
)
2.
Generate
1000
random n
-
bit numbers from scratch and lookup each of them in the BF
.
Calculate the proportion of these
1
K numbers on which the BF answers YES. The higher this proportion the higher the false positive rate.
3.
Report your results in a table whose rows correspond to
(
m
,
k
,
q
)
one row per triplet.
4.
Analyze your results to draw meaningful conclusions about the impact of the various parameter settings on the false positive rate.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions