Question
Assume a dataset A of n points in a metric space with distance metric d(, ). Let c be a constant greater than 1. Then,
Assume a dataset A of n points in a metric space with distance metric d(, ). Let c be a constant greater than 1. Then, the (c, )-Approximate Near Neighbor (ANN) problem is defined as follows: Given a query point z, assuming that there is a point x in the dataset with d(x,z) , return a point x from the dataset with d(x ,z) c (this point is called a (c,)-ANN). The parameter c therefore represents the maximum approximation factor allowed and is a userdefined parameter. Let us consider an LSH family H of hash functions that is (, c, p1, p2)-sensitive0 for the distance measure d(,). Let G1 = Hk = {g = (h1,...,hk)|hi H, 1 i k}, where k = log1/p2 (n). Let us consider the following procedure: 1. Select L = n random members g1,...,gL of G, where = $%& (( )* + ) $%& (( )- + ) 2. Hash all the data points as well as the query point using all gi (1 i L). 3. Retrieve at most2 3L data points (chosen uniformly at random) from the set of L buckets to which the query point hashes. 4. Among the points selected in Step 3 (above), report the one that is the closest to the query point as a (c, )-ANN. The goal of the first part of this problem is to show that this procedure leads to a correct answer with constant probability.
Python code for the above problem
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started