Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Use Python to answer these questions(can't put the whole dataset in here) 1. List the top-50 most common words in the text column along with

Use Python to answer these questions(can't put the whole dataset in here)

image text in transcribed

1. List the top-50 most common words in the text column along with their frequencies, considering both the global and local stopwords. You will need to define your own local stopwords for the #happy tweet set. You may treat emojis and URLs as normal words, not stopwords. Try NOT to remove important keywords by including them in your local stopwords.

2.Select from df the rows with the subjectivity score larger than or equal to 0.5 and posted by female users. List the top-50 most common words in the the selected text along with their frequencies, reusing the global and local stopwords defined in Question 1

3.Using a regular expression, add a new column hashtags to df, such that each value in the column contains a list of hashtags in the text column value . List the top-50 most common hashtags in the text column along with their frequencies, considering no stopwords. When counting each hashtag, treat them as lowercase to avoid case variations. For example, the first three tuples in the list look as follows.You will have to modify the body of the get_counter function, such that it can iterate over the hashtags column, not over the tagged_words column.

('#happy', 60822), ('#love', 5057), ('#together', 2480)

4. Get the frequency of the hashtag '#happybirthday'

5. Modify the current code in the get_stem_counter function below, so that it returns the couter of all the word stems, instead of the counter of the words themselves. For example, the first three tuples on the top-50 most common word stem list look as follows. You do not have to care about the part-of-speech tags this time, which is why the function does not have the third argument target_tag.

('happi', 59027),
('love', 9515),
('day', 6219), 
# code
def get_stem_counter(dataframe, stopwords=[]):
 counter = Counter()
 
 for l in dataframe.tagged_words:
 word_set = set()
 
 for t in l:
 word = t[0].lower()
 tag = t[1]
 
 if word in stopwords:
 continue
 else:
 word_set.add(word)
 
 counter.update(word_set)
 
 return counter

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essentials of Database Management

Authors: Jeffrey A. Hoffer, Heikki Topi, Ramesh Venkataraman

1st edition

133405680, 9780133547702 , 978-0133405682

More Books

Students also viewed these Databases questions