Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Big data similarities: Matrix of signatures Consider big data that cannot be loaded on memory. A row in a bit vector denotes a shingle, and
Big data similarities: Matrix of signatures
A row in a bit vector denotes a shingle, and a column is a data set, e.g., a document. Consider the following bit vector. O AWNO S1 S2 S3 S4 0 1 1 0 0 0 0 1 1 1 0 0 1 3 0 0 1 0 4 1 1 1 0 5 0 1 1 0 6 1 0 0 0 1) Apply Jaccard similarity function to the bit vector above. Find sim(S1, S2), sim(S2, S3), sim(S1, S4). 2) Using the following hash functions, generate a matrix of signatures. Show the result for each step. h1(x) = x+1 MOD 5 h2(x) = 2x +3 MOD 4 h3(x) = 3x + 2 MOD 4 Note that the output signature matrix has three rows. 3) Find sim(S1, S2), sim(S2, S3), sim(S1, S4) using the Jaccard similarity function. 4) Compare the similarities obtained between Q1 and Q3. Which matrix would be better Consider big data that cannot be loaded on memory.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started