Question
2. (30pts)Agoodhashfunctionh(x)behavesinpracticeveryclosetotheuniformhashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to
2. (30pts)Agoodhashfunctionh(x)behavesinpracticeveryclosetotheuniformhashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to h(). Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing. Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet = [A, . . . ,Z], and let the function f(xi) return the index of a letter xi , e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x m, define h(x) = ([ mi=1 f(xi)] mod l), where l is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the l buckets.
(a) The following list contains US Census derived last names: http://www2.census.gov/topics/genealogy/1990surnames/dist.all.last Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h(x). Produce a histogram showing the corresponding distribution of hash locations when l = 200. Label the axes of your figure. Briefly describe what the figure shows about h(x), and justify your results in terms of the behavior of h(x). Do not forget to append your code. Hint: the raw file includes information other than name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table. (
b) Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal behavior of uniform hashing.
(c) Produce a plot showing (i) the length of the longest chain (were we to use chaining for resolving collisions) as a function of the number n of these strings that we hash into a table with l = 200 buckets, and (ii) the exact upper bound on the depth of a red-black tree with n items stored. Then, (i) comment on the value of n at which the red-black tree becomes a more efficient data structure, and (ii) state the length of the longest chain when every bucket has at least one hash hit.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started