Assume a dataset A of n points in a metric space with distance metric d(, ). Let c be a constant greater than 1. Then, the (c, )-Approximate Near Neighbor (ANN) problem is defined as follows: Given a query point z, assuming that there is a point x in the dataset with d(x,z) , return a point x from the dataset with d(x ,z) c (this point is called a (c,)-ANN). The parameter c therefore represents the maximum approximation factor allowed and is a userdefined parameter. Let us consider an LSH family H of hash functions that is (, c, p1, p2)-sensitive0 for the distance measure d(,). Let G1 = Hk = {g = (h1,...,hk)|hi H, 1 i k}, where k = log1/p2 (n). Let us consider the following procedure: 1. Select L = n random members g1,...,gL of G, where = $%& (( )* + ) $%& (( )- + ) 2. Hash all the data points as well as the query point using all gi (1 i L). 3. Retrieve at most2 3L data points (chosen uniformly at random) from the set of L buckets to which the query point hashes. 4. Among the points selected in Step 3 (above), report the one that is the closest to the query point as a (c, )-ANN. The goal of the first part of this problem is to show that this procedure leads to a correct answer with constant probability.

Python code for the above problem

Question

Assume a dataset A of n points in a metric space with distance metric d(, ). Let c be a constant greater than 1. Then, the (c, )-Approximate Near Neighbor (ANN) problem is defined as follows: Given a query point z, assuming that there is a point x in the dataset with d(x,z) , return a point x from the dataset with d(x ,z) c (this point is called a (c,)-ANN). The parameter c therefore represents the maximum approximation factor allowed and is a userdefined parameter. Let us consider an LSH family H of hash functions that is (, c, p1, p2)-sensitive0 for the distance measure d(,). Let G1 = Hk = {g = (h1,...,hk)|hi H, 1 i k}, where k = log1/p2 (n). Let us consider the following procedure: 1. Select L = n random members g1,...,gL of G, where = $%& (( )* + ) $%& (( )- + ) 2. Hash all the data points as well as the query point using all gi (1 i L). 3. Retrieve at most2 3L data points (chosen uniformly at random) from the set of L buckets to which the query point hashes. 4. Among the points selected in Step 3 (above), report the one that is the closest to the query point as a (c, )-ANN. The goal of the first part of this problem is to show that this procedure leads to a correct answer with constant probability.

Python code for the above problem

Accepted Answer

The Answer is in the image, click to view ...

Question

Assume a dataset A of n points in a metric space with distance metric d(, ). Let c be a constant greater than 1. Then,

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Microsoft Dynamics 365 Core Finance And Operations Exams And Practice Tests Exam Study Guide For Microsoft Mb 300

Students also viewed these Programming questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question