Answered step by step
Verified Expert Solution
Question
1 Approved Answer
7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies (note
7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies (note that creating dummies is not required in JMP, but it is useful to understand how they are derived and used) The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat), and other (Other). It also tracks, for each customer, the number of years since first con- tact (years). Consider the following customers; information about whether they have taken a course or not (the outcome to be predicted) is included Customer 1: Stat, 1 year, did not take course Customer 2: Other, 1.1 year, took course a. Consider now the following new prospect Prospect 1: IT, 1 year Using the information given above on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 dummy variables, and a similar dataset with the categorical predictor variable transformed into 3 dummy variables b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers c. Using k-NN with k-1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies? 7.1 Calculating Distance with Categorical Predictors. This exercise with a tiny dataset illustrates the calculation of Euclidean distance and the creation of binary dummies (note that creating dummies is not required in JMP, but it is useful to understand how they are derived and used) The online education company Statistics.com segments its customers and prospects into three main categories: IT professionals (IT), statisticians (Stat), and other (Other). It also tracks, for each customer, the number of years since first con- tact (years). Consider the following customers; information about whether they have taken a course or not (the outcome to be predicted) is included Customer 1: Stat, 1 year, did not take course Customer 2: Other, 1.1 year, took course a. Consider now the following new prospect Prospect 1: IT, 1 year Using the information given above on the two customers and one prospect, create one dataset for all three with the categorical predictor variable transformed into 2 dummy variables, and a similar dataset with the categorical predictor variable transformed into 3 dummy variables b. For each derived dataset, calculate the Euclidean distance between the prospect and each of the other two customers c. Using k-NN with k-1, classify the prospect as taking or not taking a course using each of the two derived datasets. Does it make a difference whether you use 2 or 3 dummies
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started