Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A plagiarism detection service uses locality sensitive hashing ( LSH ) to find similar documents. Suppose the database has 1 0 0 , 0 0

A plagiarism detection service uses locality sensitive hashing (LSH) to find
similar documents. Suppose the database has 100,000 documents that you need to analyze to find similar documents. You have the memory capacity to compute document signatures of length 1024, and you set the number of bands to be 64 and the size of each band to be 16 rows:
a. What is the probability that two documents that are 80% similar get assigned to the same bucket?
b. What is the probability that two documents that are 25% similar get assigned to the same bucket?
c. What is the probability that two documents that are 50% similar get assigned to
different buckets?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Flash XML Applications Use AS2 And AS3 To Create Photo Galleries Menus And Databases

Authors: Joachim Schnier

1st Edition

0240809173, 978-0240809175

More Books

Students also viewed these Databases questions