Question
Big Data Application & Analysis Search Engine Question:Consider the table of term frequencies for 3 documents denoted Doc1, Doc2, Doc3 in Table 1.0. For each
Big Data Application & Analysis
Search Engine
Question:Consider the table of term frequencies for 3 documents denoted Doc1, Doc2,
Doc3 in Table 1.0.
For each document, compute the tf-idf weights for the following terms using the
idf values from Table 1.1.
cat
animal
iguana
bee
Table 1.0: Table of tf values
Doc1 | Doc2 | Doc3 | |
cat | 27 | 4 | 24 |
animal | 3 | 33 | 0 |
iguana | 0 | 33 | 29 |
bee | 14 | 0 | 17 |
Table 1.1: Table of idf values
term | dft | idft |
cat | 18,165 | 1.65 |
animal | 6,723 | 2.08 |
iguana | 19,241 | 1.62 |
bee | 25,235 | 1.5 |
2. Recall the tf-idf weights computed previously. Compute the Euclidean normalized
document vectors for each of the documents, where each vector has four
components, one for each of the four terms.
3. With term weights computed previously, rank the three documents by computing
the score for the query cat iguana, based on each of the following cases of term
weighting in the query:
a. The weight of a term is 1 if present in the query, 0 otherwise.
b. Euclidean normalized idf
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started