Question
From Mining of Massive dataset book Exercise 3.9.2 : Suppose we filter candidate pairs based only on length, as in Section 3.9.3. If s is
From Mining of Massive dataset book
Exercise 3.9.2 : Suppose we filter candidate pairs based only on length, as in
Section 3.9.3. If s is a string of length 20, with what strings is s compared when
J, the lower bound on Jaccard similarity has the following values: (a) J = 0.85
(b) J = 0.95 (c) J = 0.98?
============================================================
3.9.3 Length-Based Filtering
The simplest way to exploit the string representation of Section 3.9.2 is to sort
the strings by length. Then, each string s is compared with those strings t that
follow s in the list, but are not too long. Suppose the lower bound on Jaccard
similarity between two strings is J. For any string x, denote its length by Lx.
Note that Ls Lt. The intersection of the sets represented by s and t cannot
have more than Ls members, while their union has at least Lt members. Thus,
the Jaccard similarity of s and t, which we denote SIM(s, t), is at most Ls/Lt.
That is, in order for s and t to require comparison, it must be that J Ls/Lt,
or equivalently, Lt Ls/J.
Example 3.25 : Suppose that s is a string of length 9, and we are looking for
strings with at least 0.9 Jaccard similarity. Then we have only to compare s
with strings following it in the length-based sorted order that have length at
most 9/0.9 = 10. That is, we compare s with those strings of length 9 that
follow it in order, and all strings of length 10. We have no need to compare s
with any other string.
Suppose the length of s were 8 instead. Then s would be compared with
following strings of length up to 8/0.9 = 8.89. That is, a string of length 9
would be too long to have a Jaccard similarity of 0.9 with s, so we only have to
compare s with the strings that have length 8 but follow it in the sorted order.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started