Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Given retrieval example below, repeat the calculation for the following 2 documents ( leave query info the same) d1 ( already done) car insurance auto

Given retrieval example below,

image text in transcribed

repeat the calculation for the following 2 documents ( leave query info the same)

d1 ( already done) car insurance auto insurance

d2 ( new doc) car auto insurance auto

d3 ( new doc) car car auto insurance car

Compare the scores between the three documents, then normalize the results

2) Repeat for all three docs for the Jacard coefficient, then normalize the results

3) This question will illustrate the different between using euclidean

distances between the query and the documents versus using

vector dot product between the query and the documents

Suppose we have the following documents and query

query vocabulary d1 d2 d3

3 car 1 200 3000

1 auto 0 0 0

2 insuranc e 1 200 3000

a) Find the similarity scores using 1 + log(tf) only

using unit vector dot products between the documents and the query

b) Find a score for all three documents based upon the distance

between the query and each of the documents ( and use 1 +log(tf))

Use the equation distance = sqrt( ( qt1 - dt1)**2 + (qt2-dt2)**2 + (qt3-dt3)**2))

and use the distances as the score for all three documents (** means squared) (t is the term entry.)

Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc. Ito Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d auto best t-tf-wt dfidf wt n'liz tf-raw tf-wt wt n'liz raw 0 0 5000 2.3 0 0 1 1 1 0.52 1 1 50000 1.3 1.3 0.34 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 Exercise: what is N, the number of docs? Doc length = 1 +0 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 0 0 0.27 0.53 car insurance Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc. Ito Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d auto best t-tf-wt dfidf wt n'liz tf-raw tf-wt wt n'liz raw 0 0 5000 2.3 0 0 1 1 1 0.52 1 1 50000 1.3 1.3 0.34 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 Exercise: what is N, the number of docs? Doc length = 1 +0 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 0 0 0.27 0.53 car insurance

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Filing And Computer Database Projects

Authors: Jeffrey Stewart

2nd Edition

007822781X, 9780078227813

More Books

Students also viewed these Databases questions