Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1) Given the retrieval example in the slides which we reviewed in class, Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc.ltc Document: car insurance

image text in transcribed

1) Given the retrieval example in the slides which we reviewed in class, Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc.ltc Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d tf- tf-wt df idf wt n'liz tf-raw tf-wt wt n'liz raw auto 0 0 5000 23 0 0 1 1 1 0.520 best 1 1 50000 1.3 1.3 0.34 0 0 0 0 0 car 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 0.27 insurance 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 0.53 Exercise: what is N, the number of docs? Doc length =v1 +0 +1 +1.32 1.92 Score = 0+0+0.27+0.53 = 0.8 repeat the calculation for the following 2 documents ( leave query info the same) d1 ( already done) car insurance auto insurance d2 (new doc) car auto insurance auto d3 (new doc) car car auto insurance car Compare the scores between the three documents, then normalize the results 2) Repeat for all three docs for the Jacard coefficient, then normalize the results 3) This question will illustrate the different between using euclidean distances between the query and the documents versus using vector dot product between the query and the documents Suppose we have the following documents and query query vocabulary d1d2 d 3 3 1 2 car auto insurance 1 0 1 200 0 200 3000 0 3000 a) Find the similarity scores using 1 + log(tf) only using unit vector dot products between the documents and the query b) Find a score for all three documents based upon the distance between the query and each of the documents ( and use 1 +log(t)) Use the equation distance = sqrt(( qt1 - dt1)** 2 + (qt2-dt2)**2 + (qt3-dt3)**2)) and use the distances as the score for all three documents (** means squared) (t is the term entry.) 1) Given the retrieval example in the slides which we reviewed in class, Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc.ltc Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d tf- tf-wt df idf wt n'liz tf-raw tf-wt wt n'liz raw auto 0 0 5000 23 0 0 1 1 1 0.520 best 1 1 50000 1.3 1.3 0.34 0 0 0 0 0 car 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 0.27 insurance 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 0.53 Exercise: what is N, the number of docs? Doc length =v1 +0 +1 +1.32 1.92 Score = 0+0+0.27+0.53 = 0.8 repeat the calculation for the following 2 documents ( leave query info the same) d1 ( already done) car insurance auto insurance d2 (new doc) car auto insurance auto d3 (new doc) car car auto insurance car Compare the scores between the three documents, then normalize the results 2) Repeat for all three docs for the Jacard coefficient, then normalize the results 3) This question will illustrate the different between using euclidean distances between the query and the documents versus using vector dot product between the query and the documents Suppose we have the following documents and query query vocabulary d1d2 d 3 3 1 2 car auto insurance 1 0 1 200 0 200 3000 0 3000 a) Find the similarity scores using 1 + log(tf) only using unit vector dot products between the documents and the query b) Find a score for all three documents based upon the distance between the query and each of the documents ( and use 1 +log(t)) Use the equation distance = sqrt(( qt1 - dt1)** 2 + (qt2-dt2)**2 + (qt3-dt3)**2)) and use the distances as the score for all three documents (** means squared) (t is the term entry.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Build It For The Real World A Database Workbook

Authors: Wilson, Susan, Hoferek, Mary J.

1st Edition

0073197599, 9780073197593

More Books

Students also viewed these Databases questions

Question

What is the orientation toward time?

Answered: 1 week ago

Question

4. How is culture a contested site?

Answered: 1 week ago