Python Assignment. Function to analyze Numpy Array: Assume we have an array which contains term frequency of each document. Where each row is a document,
Python Assignment.
Function to analyze Numpy Array:
Assume we have an array which contains term frequency of each document. Where each row is a document, each column is a word, and the value denotes the frequency of the word in the document. Define a function named "analyze_tf" which:has two input parameters:
1. a rank 2 input array
2. a parameter "binary" with a default value set to False
It does the following steps in sequence:
a) If "binary" is True, binarizes the input array, i.e. if a value is greater than 1, change it to 1.
b) Normalizes the frequency of each word as: word frequency divided by the length of the document (i.e. sum of each row). Save the result as an array named tf (i.e. term frequency). The sum of each row of tf should be 1.
c) calculates the document frequency (df) of each word, i.e. how many documents contain a specific word
d) calculate the inverse document frequency (idf) of each word as N/df (df divided by N) where N is the number of documents
e) calculates tf_idf array as: tf * log (idf) (tf multiply the log (base e) of idf ). The reason is, if a word appears in most documents, it does not have the discriminative power and often is called a "stop" word. The inverse of df can downgrade the weight of such words.
returns the tf_idf array.
Note, for all the steps, do not use any loop. Just use array functions and broadcasting for high performance computation.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started