Question
Given the following documents and queries: D1: You say goodbye, I say hello D2: You say stop, I say go D3: Hello, hello, you say
Given the following documents and queries: D1: You say goodbye, I say hello D2: You say stop, I say go D3: Hello, hello, you say goodbye D4: I say high, you say low Q1: say hello Q2: you goodbye Specify the vocabulary of tokens/terms using full text indexing and no stemming (ignore capitalization and punctuation), and define an alphabetical token/term order. Construct the following: The document term matrices (document-term matrix contains rows corresponding to the documents and columns corresponding to the terms) based on Binary: only consider whether a term t appears in a document D. Repeated terms in one document are counted as 1 in binary matrices. Raw term frequency. The raw term frequency tf(t in D) is defined as the frequency of a term t appeared in document D. Normalized Term frequency See for an example - http://en.wikipedia.org/wiki/TFIDF Term frequency for a term t in a document D can be normalized by the total number of terms ND in the document. Normalized tf(t in D) = raw term frequency(t in D)/ND. = tf(t in D)/ND. tf-idf weights. The inverse document frequency idf(t) of term t can be defined using this expression: [ln (N/(nj+1)) + 1]), where N is the total number of documents in the index, nj is the document frequency of term t (document frequency is the number of documents that term t appeared in). Thus, for term t in document D: tf-idf (t)= raw term frequency(t) * idf(t) = tf(t in D)*[ln (N/(nj+1)) + 1]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started