Question
A movie ticket sales company collects reviews from customers through their app. In hopes of maintaining a high retention rate, they wish to proactively respond
A movie ticket sales company collects reviews from customers through their app. In hopes of maintaining a high retention rate, they wish to proactively respond to positive and negative feedback from their customers. The file MovieReviews.jmp contains a sample of 4000 reviews that have been labeled as positive or negative (approximately evenly split between the two classes). You are asked to use Text Analytics to build a classification model that could be incorporated into their app.
From the Text Explorer save the Document Term Matrix with a maximum of 100 terms and Binary weighting. Using these terms as the predictors and Sentimentas the response, fit a logistic regression model. Which terms have the highest predictive power, and does the presence of each term predict positive or negative sentiment? Record the Misclassification Rate for this model on the validation set
Snapshot of data