Question
In python, please. Write a program that shall calculate word sentiment level, based on the user reviews from the Yelp academic dataset. The attached file
In python, please.
Write a program that shall calculate word sentiment level, based on the user reviews from the Yelp academic dataset.
The attached file has 156,602 reviews written by Yelp members (the original dataset has 1,569,265 reviews). Each review has a text fragment and a star rating on the scale from1 (worst) to 5 (best). We assume that the words predominantly used in "bad" reviews are "bad" and the words predominantly used in "good" reviews are "good." The measure of the sentiment level of a word, therefore, is the average star rating of all reviews where the word is used.
Processing steps:
Load the JSON data from the file and select a small subset for practicing. (The final run of the program shall include all reviews.) You must use a JSON reader.
Extract all review texts and star ratings.
Break each review into individual words using NLTK.
Lemmatize the words.
Filter out stopwords and words that are not in the words corpus.
For each lemma, calculate its average star rating. If a lemma is used in fewer than 10 reviews, discard it.
Save the 500 most negative lemmas and 500 most positive lemmas and their respective sentiment levels into a two-column CSV file. You must use a CSV writer.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started