Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Background You are a movie producer trying to make better decisions regarding the types of movies you make in order to improve the popularity and

Background

You are a movie producer trying to make better decisions regarding the types of movies you make in order to improve the popularity and profit of your company. You have a dataset of the top 1000 movies including various features in addition to various potential labels indicating the popularity of the films.

Task

Use the dataset provided to complete the tasks described in each question.PYTHON CODING FOR BELOW QUESTIONS

Q1.Import the imdb.csv data file into a Pandas DataFrame and print out the first five records.

Q2.The code in the code cell below generates a unique list of genres based on each of those that appear in the Genre column.

genres = []

[for row in df.Genre: # for each row in the Genre column for genre in row.split(','): # for each genre in the row if genre not in genres: # if that genre is not already in the unique genre list genres.append(genre) # add it to the list genres

['Action', 'Adventure', 'Sci-Fi', 'Mystery', 'Horror', 'Thriller', 'Animation', 'Comedy', 'Family', 'Fantasy', 'Drama', 'Music', 'Biography', 'Romance', 'History', 'Crime', 'Western', 'War', 'Musical', 'Sport']

Your task is to:

Create a new column in the imdb.csv DataFrame for each of those genres. Use False as the default value. HINT: if you use a loop to iterate through the genre list, you can accomplish this in only two lines of code

Iterate through the rows of the DataFrame and update the values from False to True for each of the appropriate genre columns if that genre appears in the original Genre column

For example, if a particular row has a value of 'Drama, Comedy' for the Genre column, the following columns should the values:

Action = False

Adventure = False

Sci-Fi = False

Mystery = False

Horror = False

Thriller = False

Animation = False

Comedy = True

Family = False

Fantasy = False

Drama = True

Music = False

Biography = False

Romance = False

History = False

Crime = False

Western = False

War = False

Musical = False

Sport = False

Print out the last five records of this updated DataFrame

 Question 3. Next, let's analyze the effect of genre on Rating. Create 5 lists of Ratings--one list for each of the following genres: Action, Drama, Comedy, Horror, Romance

Compare these lists of ratings in an ANOVA to see if there are any significant differences. Print out the F and p-value from this ANOVA

 
Question4 Now that we know whether there are significant differences among these Rating means, let's see what the means are. Print out the means of each of the five genres from the last problem

Question 5.

Let's see if the critics agree with the movie-goers. Perform the same analyses using Metascore (a score generated by critics)as the label. Compare the same five genres and print out both their mean Metascores

Question 6

Let's run one more analyses on the Year of the movie. Even though Year is numeric and has an order, we do not have a theory (i.e. a logical reason) to suspect that movies get better or worse over time. Therefore, we should analyze Year as a nominal/object data type using an ANOVA.

Therefore, convert the Year column to an object and then compare the Rating of the years 2012-2016 by printing out both the mean Rating for each of those years and the F and p-value of an ANOVA comparing them.

Answer the question below

1. Import the imdb.csv data file into a Pandas DataFrame and print out the first five records.

What is the number 1 movie in the list? Copy and paste the title exactly as it appears ______

2.

Now that we know whether there are significant differences among these Rating means, let's see what the means are. Print out the means of each of the five genres from the last problem.

Which genre had the worst average rating?

A.Romance

B.Comedy

C.Drama

D.Action

E.Horror

3.Let's see if the critics agree with the movie-goers. Perform the same analyses using Metascore as the label. Compare the same five genres and print out both their mean Metascores.

Which genre did critics like the least?

A.Drama

B.Romance

C.Horror

D.Comedy

E.Action

4. Let's run one more analyses on the Year of the movie. Even though Year is numeric and has an order, we do not have a theory (i.e. a logical reason) to suspect that movies get better or worse over time. Therefore, we should analyze Year as a nominal/object data type using an ANOVA.

Compare the Rating of the years 2012-2016 by printing out both the mean Rating for each of those years and the F and p-value of an ANOVA comparing them.

If there is a trend in this data over time, what is it?

A.There is a strong consistent drop over time

B.Although it's not a perfect trend, it appears that Ratings are dropping over time

C.There is a strong consistent improvement over time

D.There is no trend over time

E.Although it's not a perfect trend, it appears that Ratings are improving over time

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Multinational Finance

Authors: Kirt C. Butler

4th Edition

1405181184, 978-1405181181

More Books

Students also viewed these Finance questions