Question
Background You are a movie producer trying to make better decisions regarding the types of movies you make in order to improve the popularity and
Background
You are a movie producer trying to make better decisions regarding the types of movies you make in order to improve the popularity and profit of your company. You have a dataset of the top 1000 movies including various features in addition to various potential labels indicating the popularity of the films.
Task
Use the dataset provided to complete the tasks described in each question.PYTHON CODING FOR BELOW QUESTIONS
Q1.Import the imdb.csv data file into a Pandas DataFrame and print out the first five records.
Q2.The code in the code cell below generates a unique list of genres based on each of those that appear in the Genre column.
genres = []
[for row in df.Genre: # for each row in the Genre column for genre in row.split(','): # for each genre in the row if genre not in genres: # if that genre is not already in the unique genre list genres.append(genre) # add it to the list genres
['Action', 'Adventure', 'Sci-Fi', 'Mystery', 'Horror', 'Thriller', 'Animation', 'Comedy', 'Family', 'Fantasy', 'Drama', 'Music', 'Biography', 'Romance', 'History', 'Crime', 'Western', 'War', 'Musical', 'Sport']
Your task is to:
Create a new column in the imdb.csv DataFrame for each of those genres. Use False as the default value. HINT: if you use a loop to iterate through the genre list, you can accomplish this in only two lines of code
Iterate through the rows of the DataFrame and update the values from False to True for each of the appropriate genre columns if that genre appears in the original Genre column
For example, if a particular row has a value of 'Drama, Comedy' for the Genre column, the following columns should the values:
Action = False
Adventure = False
Sci-Fi = False
Mystery = False
Horror = False
Thriller = False
Animation = False
Comedy = True
Family = False
Fantasy = False
Drama = True
Music = False
Biography = False
Romance = False
History = False
Crime = False
Western = False
War = False
Musical = False
Sport = False
Print out the last five records of this updated DataFrame
Question 3. Next, let's analyze the effect of genre on Rating. Create 5 lists of Ratings--one list for each of the following genres: Action, Drama, Comedy, Horror, Romance
Compare these lists of ratings in an ANOVA to see if there are any significant differences. Print out the F and p-value from this ANOVA
Question4 Now that we know whether there are significant differences among these Rating means, let's see what the means are. Print out the means of each of the five genres from the last problem
Question 5.
Let's see if the critics agree with the movie-goers. Perform the same analyses using Metascore (a score generated by critics)as the label. Compare the same five genres and print out both their mean Metascores
Question 6
Let's run one more analyses on the Year of the movie. Even though Year is numeric and has an order, we do not have a theory (i.e. a logical reason) to suspect that movies get better or worse over time. Therefore, we should analyze Year as a nominal/object data type using an ANOVA.
Therefore, convert the Year column to an object and then compare the Rating of the years 2012-2016 by printing out both the mean Rating for each of those years and the F and p-value of an ANOVA comparing them.
Answer the question below
1. Import the imdb.csv data file into a Pandas DataFrame and print out the first five records.
What is the number 1 movie in the list? Copy and paste the title exactly as it appears ______
2.
Now that we know whether there are significant differences among these Rating means, let's see what the means are. Print out the means of each of the five genres from the last problem.
Which genre had the worst average rating?
A.Romance B.Comedy C.Drama D.Action E.Horror |
3.Let's see if the critics agree with the movie-goers. Perform the same analyses using Metascore as the label. Compare the same five genres and print out both their mean Metascores.
Which genre did critics like the least?
A.Drama B.Romance C.Horror D.Comedy E.Action |
4. Let's run one more analyses on the Year of the movie. Even though Year is numeric and has an order, we do not have a theory (i.e. a logical reason) to suspect that movies get better or worse over time. Therefore, we should analyze Year as a nominal/object data type using an ANOVA.
Compare the Rating of the years 2012-2016 by printing out both the mean Rating for each of those years and the F and p-value of an ANOVA comparing them.
If there is a trend in this data over time, what is it?
A.There is a strong consistent drop over time B.Although it's not a perfect trend, it appears that Ratings are dropping over time C.There is a strong consistent improvement over time D.There is no trend over time E.Although it's not a perfect trend, it appears that Ratings are improving over time |
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started