Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies (note

image text in transcribed

7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies (note that creating dummies is not required in JMP, but it is useful to understand how they are derived and used) The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat), and other (Other). It also tracks, for each customer, the number of years since first con- tact (years). Consider the following customers; information about whether they have taken a course or not (the outcome to be predicted) is included Customer 1: Stat, 1 year, did not take course Customer 2: Other, 1.1 year, took course a. Consider now the following new prospect Prospect 1: IT, 1 year Using the information given above on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 dummy variables, and a similar dataset with the categorical predictor variable transformed into 3 dummy variables b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers c. Using k-NN with k-1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies? 7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies (note that creating dummies is not required in JMP, but it is useful to understand how they are derived and used) The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat), and other (Other). It also tracks, for each customer, the number of years since first con- tact (years). Consider the following customers; information about whether they have taken a course or not (the outcome to be predicted) is included Customer 1: Stat, 1 year, did not take course Customer 2: Other, 1.1 year, took course a. Consider now the following new prospect Prospect 1: IT, 1 year Using the information given above on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 dummy variables, and a similar dataset with the categorical predictor variable transformed into 3 dummy variables b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers c. Using k-NN with k-1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Probabilistic Databases

Authors: Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch

1st Edition

3031007514, 978-3031007514

More Books

Students also viewed these Databases questions