Refer to Exercise 15 for a description of the data set. a. Create a random forest ensemble
Question:
Refer to Exercise 15 for a description of the data set.
a. Create a random forest ensemble classification tree model. Select two predictor variables randomly to construct each weak learner. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? What is the AUC value of the model? Which is the most important predictor variable?
b. Compare the performance of the random forest ensemble model to that of the single-tree model created in Exercise 15 (for Analytic Solver) or Exercise 16 (for R). Which model shows more robust performance? Explain.
c. Score the new cases in the Church_Score worksheet using the random forest ensemble classification tree model. What percentage of the individuals in the score data set are likely to go to church based on a cutoff probability value of 0.5?
Data from Exercises 15
The following data set in the Church_ Data worksheet is used to classify individuals as likely or unlikely to attend church using five predictor variables: years of education (Educ), annual income (Income in $), age, sex (F = female, M = male), and marital status (Married, Y = yes, N = no). The outcome variable is Church (1 = attends, 0 otherwise). Create a classification tree model for predicting whether the individual is likely to attend church. Select the best-pruned tree for scoring and display the full-grown, best-pruned, and minimum error trees.
Step by Step Answer:
Business Analytics Communicating With Numbers
ISBN: 9781260785005
1st Edition
Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen