Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You will be using a Netflix Titles database. For each title, the file contains variables like type ( movie or show ) release year, popularity

You will be using a Netflix Titles database. For each title, the file contains variables like type (movie or show) release year, popularity score, rating score, runtime, and genre. The genre variable contains multiple genres all in one field. As is, this field is not usable for data analysis. So, if the data analysist wanted to use the genres as part of the analysis, they would have to find a way to abstract the important genres into dummy variables that are usable. The dummy variables are already created in the data file. The formula to accomplish this is a bit tricky. First, convert the database to a Table. This conversion makes a lot of things easier. Then if you want to see how the dummy variables were created, copy this Excel formula into cell P2 of the spreadsheet: =IF(ISERROR(FIND(LOWER(Table1[[#Headers],[Action]]),[@genres]))=TRUE,0,1). The formula looks to see IF it can FIND the column label name of "Action" in values of the "genres" column. IF it can't find the label name, the result produces and error code (ISERROR) in Excel. The ISERROR function is used to detect and handle errors in combination with the IF function. When the genre name is not found and there is an error (ISERROR=TRUE), the formula puts a "0" in the "Action" column for that particular title. IF it does find the genre name, it puts a "1" in the "Action" column for that particular title. "Action" is then a dummy variable that indicates whether the title is in the Action genre (1=Yes; 0=No). The same thing was done for other genre dummy columns. (Note: The LOWER function is used around the "Action" column header because the "genre" column has all lower case values.) Also note that there are some interaction variables. That is, there are dummy variables that indicate whether a title is in two or three genres. For example, the column "Action_Drama" is a dummy variable that indicates whether a title is in both the Action and Drama genres. Dummy variables are useful when the variable is used in analyses like correlation matrices, regression, and market basket analysis, because these analyses require numeric calculations.
There are also created alpha equivalent columns of the dummy variables. For example, the column "Action_Alpha" (column AB) is an alpha equivalent of the "Action" dummy variable. In the "Action_Alpha" column, a "1" from the "Action" column was converted into the word "Action", and a "0" from the "Action" column was converted in the word "Non-Action". This was very easy to do. Try this formula in cell AB2: =IF([@Action]=1,"Action","Non-Action"). Alpha variables are useful when the variable is used in a Chart, PivotTable. or PivotChart because the values appear as easily understandable labels.
Below is a quick demo of how to create the new variables.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases In Networked Information Systems 6th International Workshop Dnis 2010 Aizu Wakamatsu Japan March 2010 Proceedings Lncs 5999

Authors: Shinji Kikuchi ,Shelly Sachdeva ,Subhash Bhalla

2010th Edition

3642120377, 978-3642120374

More Books

Students also viewed these Databases questions

Question

explain what is meant by experiential learning

Answered: 1 week ago

Question

identify the main ways in which you learn

Answered: 1 week ago