Question
1. Consider the following product review sample from Amazon. helpful is a variable indicating the total number of helpful votes each review has received so
1. Consider the following product review sample from Amazon. helpful is a variable indicating the total number of helpful votes each review has received so far; score refers to the rating of the product being reviewed. Suppose your goal is to use the review text to train a model for predicting the sales of a product (which is highly correlated with its average rating on Amazon)
(1) [6 points] Which one of the two terms, flavor or buck, is more informative about review #4 in the review corpus? Justify your answer by calculating their TF-IDF scores.
Note: The TF-IDF scores should be calculated after word-stemming and removing stopwords and non-words. Assume that stopwords = c(is, and, the, of) and word-stemming replaces nouns in their plural form with their singular form. Please specify your own list of non-words.
(2) [2 points] In this analysis, should we pre-process the review data by removing numbers from the text? Why or why not? Please explain.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started