Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 07, 2024

Use Python to answer these questions(can't put the whole dataset in here) 1. List the top-50 most common words in the text column along with

Use Python to answer these questions(can't put the whole dataset in here)

image text in transcribed

1. List the top-50 most common words in the text column along with their frequencies, considering both the global and local stopwords. You will need to define your own local stopwords for the #happy tweet set. You may treat emojis and URLs as normal words, not stopwords. Try NOT to remove important keywords by including them in your local stopwords.

2.Select from df the rows with the subjectivity score larger than or equal to 0.5 and posted by female users. List the top-50 most common words in the the selected text along with their frequencies, reusing the global and local stopwords defined in Question 1

3.Using a regular expression, add a new column hashtags to df, such that each value in the column contains a list of hashtags in the text column value . List the top-50 most common hashtags in the text column along with their frequencies, considering no stopwords. When counting each hashtag, treat them as lowercase to avoid case variations. For example, the first three tuples in the list look as follows.You will have to modify the body of the get_counter function, such that it can iterate over the hashtags column, not over the tagged_words column.

('#happy', 60822), ('#love', 5057), ('#together', 2480)

4. Get the frequency of the hashtag '#happybirthday'

5. Modify the current code in the get_stem_counter function below, so that it returns the couter of all the word stems, instead of the counter of the words themselves. For example, the first three tuples on the top-50 most common word stem list look as follows. You do not have to care about the part-of-speech tags this time, which is why the function does not have the third argument target_tag.

('happi', 59027),

('love', 9515),

('day', 6219),

# code

def get_stem_counter(dataframe, stopwords=[]):

 counter = Counter()

 for l in dataframe.tagged_words:

 word_set = set()

 for t in l:

 word = t[0].lower()

 tag = t[1]

 if word in stopwords:

 continue

 else:

 word_set.add(word)

 counter.update(word_set)

 return counter

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essentials of Database Management

Authors: Jeffrey A. Hoffer, Heikki Topi, Ramesh Venkataraman

1st edition

133405680, 9780133547702 , 978-0133405682

More Books

Students also viewed these Databases questions

Question

★★★★★

Tamson Russell, an economist working for the government, was trying to determine the demand function for passenger car motor fuel in the United States. Tamson developed a model that used the actual...

Answered: 1 week ago

Question

★★★★★

B. Analyze the conversation in terms of the following components of the interpersonal communication process: encoding, decoding, physical noise, physiological noise, psychological noise, semantic...

Answered: 1 week ago

Question

★★★★★

4. The content exchanged in individualised e-mails or instant messages is: (a) not included in UGC (b) modified in UGC (c) included in UGC (d) subject to the choice of the user

Answered: 1 week ago

Question

★★★★★

1a. Some common numerical thresholds and benchmarks for overall materiality judgments are 5% of net income and 1% of assets. The materiality level at which items are considered clearly trivial-a...

Answered: 1 week ago

Question

★★★★★

Use Python to answer these questions(can't put the whole dataset in here) 1. List the top-50 most common words in the text column along with their frequencies, considering both the global and local...

Answered: 1 week ago

Question

★★★★★

https://www.youtube.com/watch?v=4fYjsdYSG0g&ab_channel=PracticalWisdom-InterestingIdeas Watch the 11:06 minute video on office politics linked above in the content this week. Choose one of the tips...

Answered: 1 week ago

Question

★★★★★

5 14.28 points Book Print Heferences Brief Exercise 11-4 (Algo) Assessing the magnitude of operating leverage LO 11-4 The following income statement relates to Riley Company for the year. Sales...

Answered: 1 week ago

Question

★★★★★

Model the motion of an aging spring by replacing the spring constant k with a decreasing function ke-at, where a is a positive constant and t is time. Refine the model of the motion of a spring to...

Answered: 1 week ago

Question

★★★★★

1. In a world of superheroes, there are 10 superheroes with unique powers, and you need to form a superhero team. Your superhero team will consist of 6 members selected from these 10 superheroes....

Answered: 1 week ago

Question

★★★★★

PHYSICS 11/20 Unit 1 ~ Learning Guide Name: Instructions: WCLN.ca Using a pencil, complete the following notes as you work through the related lessons. Show ALL work as is explained in the lessons....

Answered: 1 week ago

Question

★★★★★

Which country has cumulatively emitted the most carbon dioxide into the atmosphere since the beginning of the industrial revolution? United States O China Japan O India O United Kingdom

Answered: 1 week ago

Previous Question Next Question