Answered step by step
Verified Expert Solution
Question
1 Approved Answer
CASE 2 Instructions: Please use the Stolenrecords and crimebystate datasets to complete the following analysis. Please answer each questions fully and supply any supporting
CASE 2 Instructions: Please use the Stolenrecords and crimebystate datasets to complete the following analysis. Please answer each questions fully and supply any supporting analysis or results (e.g. screenshots). Each question is worth 10 points unless otherwise noted. Background: Your company recently had a security breach in which millions of customers private information was stolen from your company. Your company's reputation is at risk, so you are interested in providing assistance and guidance to these customers about protecting themselves from identity theft (thieves using the information to open other accounts or commit other illegal acts). You would like to identify which customers are more likely to be a victim. You have a file from a previous breach that has information on customers and which of the customers became a victim of identity theft (Stolenrecords). You also have a file of crime statistics by state (crimebystate). Use these two files to answer the following questions: 1. Build a classification tree for Identity theft by determining which variables to include as predictors (fit what you think is the best model). a. Which variables, if any, did you choose not to include in the model? Why? b. How many splits are in your final tree? (5 points) c. What is the misclassification rate for this model? Is the model better at predicting victims or non-victims? Explain. d. What is the area under the ROC curve for Victims? Interpret this value. Does the model do a better job of classifying victims than a random model? e. What is the lift for the model at portion = 0.1 and at portion = 0.20? Interpret these values. 2. Use the Fit Model platform to create a logistic regression model for Victims? using the other variables as predictors (fit what you think is the best model). a. Which variables are statistically significant in predicting the probability of being a victim (5 points)? b. What is the misclassification rate for your final logistic model? c. Compare the misclassification rates for the logistic model and the decision tree created above. Which model is better? Why? d. Compare this model to the model produced using a classification tree. Which model would be easier to explain to a non-technical person? Why? e. Based on the models you created (decision tree and logistic), what are your major conclusions about the relationships between these variables and the probability of being an identity theft victim? (20 points)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started