The following data set in the Church_ Data worksheet is used to classify individuals as likely or
Question:
The following data set in the Church_ Data worksheet is used to classify individuals as likely or unlikely to attend church using five predictor variables: years of education (Educ), annual income (Income in $), age, sex (F = female, M = male), and marital status (Married, Y = yes, N = no). The outcome variable is Church (1 = attends, 0 otherwise). Create a classification tree model for predicting whether the individual is likely to attend church. Select the best-pruned tree for scoring and display the full-grown, best-pruned, and minimum error trees.
a. How many leaf nodes are in the best-pruned tree and minimum error tree? What are the rules that can be derived from the best-pruned tree?
b. What are the accuracy rate, sensitivity, specificity, and precision of the best-pruned tree on the test data?
c. Generate the ROC curve. What is the area under the ROC curve (or AUC value)?
d. Score the cases in the Church_Score worksheet using the best-pruned tree. What percentage of the individuals in the score data set are likely to go to church based on a cutoff probability value of 0.5?
Step by Step Answer:
Business Analytics Communicating With Numbers
ISBN: 9781260785005
1st Edition
Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen