Question
Online Code Test || Only 30 minutes remaining Given a corpus C of documents (as a list of strings), a word token and a document
Online Code Test || Only 30 minutes remaining
Given a corpus C of documents (as a list of strings), a word token and a document index, find the term frequency - inverse document frequency (tfidf) of the token in the document relative to the corpus. A document can be considered to be a sequence of tokens separated by a space. We will assume the following definitions: term frequency (tf) of token tt in a document: the number of times the token appears in a given document inverse document frequency (idf) of token tt: 1+log2(C1+nt)1+log2(1+ntC)where CC is the size of the corpus (i.e. the number of documents in C), ntnt is the total number of documents that contain the token tt and log2log2 is the logarithm to the base 2
Finally, tfidf = tf * idf (i.e. a product of tf and idf).
For the purposes of computation, the case of the token in the document should be ignored (e.g.The, THE and the should be treated as the same token).
-
[execution time limit] 4 seconds (py)
-
[input] array.string corpus
List of documents in the corpus
-
[input] integer doc_idx
index (0 based) of the document in the corpus
-
[input] string token
input token for computing tfidf
-
[output] float
tfidf value
[Python 2] Syntax Tips
# Prints help message to the console # Returns a string def helloWorld(name): print "This prints to the console when you Run Tests" return "Hello, " + name
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started