Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Background and Introduction The final project for the course will require you to complete some tasks based on a hypothetical request from an independent film

image text in transcribedimage text in transcribed
Background and Introduction The final project for the course will require you to complete some tasks based on a hypothetical request from an independent film company. They are trying to decide how to allocate their resources in order to get more views on Netfl ix. There is a website, FlixGem https://f|ixgem.com (httpszflflixgemcom), which collects data from different sources and produces what they call a \"Hidden Gem\" score, which users then use to choose new movies and television series to watch that they might not come across. The company has several questions to ask of this data, all of which will help them going forward. The data for this project comes from a Kaggle project, the details of which can be found at https://www.kaggle.com/syedmubaraketflix-dataset- latest-2021 (httpszllwww.kaggle.com/syedmubarakfnetflix-dataset-latest-2021). l have attached the dataset to the project description on Crowdmark so that you do not have to register for an account on Kaggle. l have also reduced the number of variables, to help to limit the scope of the project. The data set you can download from Crowdmark contains the following data from the original dataset: Variable Description Title Movie or series title Languages Languages in the film Series or Movie Series or standalone movie Hidden Gem Score Hidden Gem Score from FlixGem Fluntime Fluntime Category Director Director IMDb Score IMDb Score Rotten Tomatoes ScoreHotten Tomatoes Score Metacritic Score Metacritic Score Release Date Flelease Date Summary Movie Summary Note that IM Db, Flotten Tomatoes, and Metacritic are all differnt websites which specialize in rating movies based on general public and movie critic reviews. Objectives and evaluation The project requires you to complete three tasks, detailed below. You should prepare a single report containing your answers to all tasks. Include the code for each task in your report, for reproducibility purposes. You may include the code as code chunks where the analyses are taking place or, if you prefer, you may include it at the end (although the code should be clearly commented so that it is clear which task each block of code corresponds to). The completion of each task is worth 25 points. The quality of presentation will also be worth 25 points, i.e. clarity of explanation, plots, tables, and code. Tasks to complete Task 1: Data wrangling and exploratory data analyses The first task is to do some data wrangling (Le. cleaning and manipulation) and conduct some exploratory data analyses. The film company DOES NOT want results for Series, only for Movies, since they only produce movies. Second, they know that there is missingness in some of the variables, but they are content to allow you to drop any records containing any missing values for the purposes of this analysis (so you should). Include any plots and summary statistics that you think will aid in supporting your assessments. Based on the subsetted and cleaned data, please answer the following questions: a. Does the Hidden Gem Score seems to be associated to the Runtime Category or the languages used in the film? Explain briefly the reasons behind your assessment. Hint: You may need to do some re-coding of one or both of these variables. Any reasonable re-coding is fine, just be sure to be clear what you've done. b. Do any of the three review site scores (IMDb, Rotten Tomatoes. Metacritic) seem to be strongly or weakly correlated with the Hidden Gem Scores? Explain briefly the reasons behind your assessment and the nature of those associations. c. The company has a theory that people are becoming more acceptable of longer movies because they can watch them at home on Netflix and other content-collecting sites. Do you notice any trend over time in the Hidden Gem Scores by category of RunTIme Length? Explain briefly the reasons behind your assessment. Task 2: Factors of the Hidden Gem Score Recall that the goal of the company is to make decisions about what the most important factors are that contribute to the Hidden Gem Score. The company has suggested that a Regression Tree could be used to maybe identify those factors. Regression trees would work particularly well for this problem due to the categorical nature of the data. A description of regression trees can be found here: httpszlluc- r.github.io/regression_trees (https://uc-r.github.io/regression_trees) with example code. Apply the rpart function to the data using the Hidden Gem Score as the outcome and Languages, Runtime, IM Db Score, Rotten Tomatoes Score and Metacritic Score as predictors. Summarize what you think are the most important features for predicting the hiddden Gem Score based on the fitted tree and summarize how well your predictions perform. NOTE: You DO NOT have to implement any Bagging or Split Optimization from the article beyond what the rpart function already provides. (but of course you can if you're excited to do so)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Differential Equations For Engineers

Authors: Wei Chau Xie

1st Edition

0511771037, 9780511771033

More Books

Students also viewed these Mathematics questions