Question
Given retrieval example below, repeat the calculation for the following 2 documents ( leave query info the same) d1 ( already done) car insurance auto
Given retrieval example below,
repeat the calculation for the following 2 documents ( leave query info the same)
d1 ( already done) car insurance auto insurance
d2 ( new doc) car auto insurance auto
d3 ( new doc) car car auto insurance car
Compare the scores between the three documents, then normalize the results
2) Repeat for all three docs for the Jacard coefficient, then normalize the results
3) This question will illustrate the different between using euclidean
distances between the query and the documents versus using
vector dot product between the query and the documents
Suppose we have the following documents and query
query vocabulary d1 d2 d3
3 car 1 200 3000
1 auto 0 0 0
2 insuranc e 1 200 3000
a) Find the similarity scores using 1 + log(tf) only
using unit vector dot products between the documents and the query
b) Find a score for all three documents based upon the distance
between the query and each of the documents ( and use 1 +log(tf))
Use the equation distance = sqrt( ( qt1 - dt1)**2 + (qt2-dt2)**2 + (qt3-dt3)**2))
and use the distances as the score for all three documents (** means squared) (t is the term entry.)
Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc. Ito Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d auto best t-tf-wt dfidf wt n'liz tf-raw tf-wt wt n'liz raw 0 0 5000 2.3 0 0 1 1 1 0.52 1 1 50000 1.3 1.3 0.34 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 Exercise: what is N, the number of docs? Doc length = 1 +0 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 0 0 0.27 0.53 car insurance Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc. Ito Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d auto best t-tf-wt dfidf wt n'liz tf-raw tf-wt wt n'liz raw 0 0 5000 2.3 0 0 1 1 1 0.52 1 1 50000 1.3 1.3 0.34 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 Exercise: what is N, the number of docs? Doc length = 1 +0 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 0 0 0.27 0.53 car insurance
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started