Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

2. (30pts)Agoodhashfunctionh(x)behavesinpracticeveryclosetotheuniformhashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to

2. (30pts)Agoodhashfunctionh(x)behavesinpracticeveryclosetotheuniformhashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to h(). Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing. Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet = [A, . . . ,Z], and let the function f(xi) return the index of a letter xi , e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x m, define h(x) = ([ mi=1 f(xi)] mod l), where l is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the l buckets.

(a) The following list contains US Census derived last names: http://www2.census.gov/topics/genealogy/1990surnames/dist.all.last Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h(x). Produce a histogram showing the corresponding distribution of hash locations when l = 200. Label the axes of your figure. Briefly describe what the figure shows about h(x), and justify your results in terms of the behavior of h(x). Do not forget to append your code. Hint: the raw file includes information other than name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table. (

b) Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal behavior of uniform hashing.

(c) Produce a plot showing (i) the length of the longest chain (were we to use chaining for resolving collisions) as a function of the number n of these strings that we hash into a table with l = 200 buckets, and (ii) the exact upper bound on the depth of a red-black tree with n items stored. Then, (i) comment on the value of n at which the red-black tree becomes a more efficient data structure, and (ii) state the length of the longest chain when every bucket has at least one hash hit.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Transact SQL Cookbook Help For Database Programmers

Authors: Ales Spetic, Jonathan Gennick

1st Edition

1565927567, 978-1565927568

More Books

Students also viewed these Databases questions

Question

1. Design an effective socialization program for employees.

Answered: 1 week ago