Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You will be using a Netflix Titles database. For each title, the file contains variables like type ( movie or show ) release year, popularity

You will be using a Netflix Titles database. For each title, the file contains variables like type (movie or show) release year, popularity score, rating score, runtime, and genre. The genre variable contains multiple genres all i one field. As is, this field is not usable for data analysis. So, if the data analysist wanted to use the genres as part of the analysis, they would have to find a way to abstract the important genres into dummy variables that are usable. The dummy variables are already created in the data file. The formula to accomplish this is a bit tricky. First, convert the database to a Table. This conversion makes a lot of things easier. Then if you want to see how the dummy variables were created, copy this Excel formula into cell P2 of the spreadsheet:
=IF(ISERROR(FIND(LOWER(Table1[[#Headers],[Action]]),[@genres]))=TRUE,0,1). The formula looks to see IF it can FIND the column label name of "Action" in values of the "genres" column. IF it can't find the label name, the result produces and error code (ISERROR) in Excel. The ISERROR function is used to detect and handle errors in combination with the IF function. When the genre name is not found and there is an error (ISERROR=TRUE), the formula puts a "O" in the "Action" column for that particular title. IF it does find the genre name, it puts a "1" in the "Action" column for that particular title. "Action" is then a dummy variable that indicates whether the title is in the Action genre (1=Yes;O=No). The same thing was done for other genre dummy columns. (Note: The LOWER function is used around the "Action" column header because the "genre" column has all lower case values.) Also note that there are some interaction variables. That is, there are dummy variables that indicate whether a title is in two or three genres. For example, the column "Action_Drama" is a dummy variable that indicates whether a title is in both the Action and Drama genres. Dummy variables are useful when the variable is used in analyses like correlation matrices, regression, and market basket analysis, because these analyses require numeric calculations.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

PostgreSQL 10 High Performance Expert Techniques For Query Optimization High Availability And Efficient Database Maintenance

Authors: Ibrar Ahmed ,Gregory Smith ,Enrico Pirozzi

3rd Edition

1788474481, 978-1788474481

More Books

Students also viewed these Databases questions

Question

Understand the activation theory of information exposure.

Answered: 1 week ago

Question

Create a workflow analysis.

Answered: 1 week ago