Question
Here is a set of documents: D1= Lucie has a pencil D2= The house is red D3= The red pencil is in the red house
Here is a set of documents:
D1= "Lucie has a pencil"
D2= "The house is red"
D3= "The red pencil is in the red house"
D4= "The red policeman is on leave with his red shoes"
D5= "After he left the house, the policeman left on his red bicycle"
For each word appearing in the documents, calculate the idf factor.
1- The same document D was introduced x times, by mistake, in a set of documents indexed by the vector model tf.idf. Assuming x>1, comment on the effect of this error? Is the document in question more or less likely to be found among the results of a keyword search when x is large?
2- In the tf.idf vector model, what is the best strategy to find a specific document by searching by keywords:
- Choose, as keywords, the most frequent words of the document sought;
- Choose, as keywords, the most frequent words of the document sought, but which are also frequent in all the documents;
- Choose, as keywords, the most frequent words of the document sought, but which are also infrequent in all the documents.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started