Question
Use Python to answer these questions(can't put the whole dataset in here) 1. List the top-50 most common words in the text column along with
Use Python to answer these questions(can't put the whole dataset in here)
1. List the top-50 most common words in the text column along with their frequencies, considering both the global and local stopwords. You will need to define your own local stopwords for the #happy tweet set. You may treat emojis and URLs as normal words, not stopwords. Try NOT to remove important keywords by including them in your local stopwords.
2.Select from df the rows with the subjectivity score larger than or equal to 0.5 and posted by female users. List the top-50 most common words in the the selected text along with their frequencies, reusing the global and local stopwords defined in Question 1
3.Using a regular expression, add a new column hashtags to df, such that each value in the column contains a list of hashtags in the text column value . List the top-50 most common hashtags in the text column along with their frequencies, considering no stopwords. When counting each hashtag, treat them as lowercase to avoid case variations. For example, the first three tuples in the list look as follows.You will have to modify the body of the get_counter function, such that it can iterate over the hashtags column, not over the tagged_words column.
('#happy', 60822), ('#love', 5057), ('#together', 2480)
4. Get the frequency of the hashtag '#happybirthday'
5. Modify the current code in the get_stem_counter function below, so that it returns the couter of all the word stems, instead of the counter of the words themselves. For example, the first three tuples on the top-50 most common word stem list look as follows. You do not have to care about the part-of-speech tags this time, which is why the function does not have the third argument target_tag.
('happi', 59027),
('love', 9515),
('day', 6219),
# code
def get_stem_counter(dataframe, stopwords=[]):
counter = Counter()
for l in dataframe.tagged_words:
word_set = set()
for t in l:
word = t[0].lower()
tag = t[1]
if word in stopwords:
continue
else:
word_set.add(word)
counter.update(word_set)
return counter
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started