Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Here is a set of documents: D1= Lucie has a pencil D2= The house is red D3= The red pencil is in the red house

Here is a set of documents:

D1= "Lucie has a pencil"

D2= "The house is red"

D3= "The red pencil is in the red house"

D4= "The red policeman is on leave with his red shoes"

D5= "After he left the house, the policeman left on his red bicycle"

For each word appearing in the documents, calculate the idf factor.

1- The same document D was introduced x times, by mistake, in a set of documents indexed by the vector model tf.idf. Assuming x>1, comment on the effect of this error? Is the document in question more or less likely to be found among the results of a keyword search when x is large?

2- In the tf.idf vector model, what is the best strategy to find a specific document by searching by keywords:

- Choose, as keywords, the most frequent words of the document sought;

- Choose, as keywords, the most frequent words of the document sought, but which are also frequent in all the documents;

- Choose, as keywords, the most frequent words of the document sought, but which are also infrequent in all the documents.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Visual Basic6 Database Programming

Authors: John W. Fronckowiak, David J. Helda

1st Edition

ISBN: 0764532545, 978-0764532542

More Books

Students also viewed these Databases questions

Question

11. l-lo"v rece11t are ilie so1trces?

Answered: 1 week ago

Question

Which type of backup requires the least time to complete a backup?

Answered: 1 week ago