Mateo Derby works as a cyber security analyst at a private equity firm. His colleagues at the
Question:
Mateo Derby works as a cyber security analyst at a private equity firm. His colleagues at the firm have been inundated by a large number of spam e-mails. Mateo has been asked to implement a spam detection system on the company’s e-mail server. He reviewed a sample of 500 spam and legitimate e-mails with relevant variables: spam (1 if spam, 0 otherwise), the number of recipients, the number of hyperlinks, and the number of characters in the message. A portion of the Spam_Data worksheet is shown in the accompanying table.
a. Create a bagging ensemble classification tree model to determine whether a future e-mail is spam. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? What is the AUC value of the model?
b. Create a random forest ensemble classification tree model. Select two predictor variables randomly to construct each weak learner. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? What is the AUC value of the model? Which is the most important predictor variable?
c. Score the new cases in the Spam_Score worksheet using the bagging ensemble classification tree model. What percentage of the e-mails is spam?
Step by Step Answer:
Business Analytics Communicating With Numbers
ISBN: 9781260785005
1st Edition
Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen