Question

1 Approved Answer

Posted on Sep 21, 2024

PLEASE DO NOT COPY ANSWER FROM OTHER WEBSITE 2. It is mentioned in Section 8.2.3 that boosting using depthone trees (or stamps) leads to an

PLEASE DO NOT COPY ANSWER FROM OTHER WEBSITE

2. It is mentioned in Section 8.2.3 that boosting using depthone trees (or stamps) leads to an additive model: that is, a model of the form f(X) = ijlle- 3:1 Explain why this is the case. You can begin with (8.12) in Algorithm 8.2. 3. Consider the Gini index, classication error, and entropy in a simple classication setting with two classes. Create a single plot that displays each of these quantities as a function of ml. The :1:- axis should display pml, ranging from O to 1, and the yaxis should display the value of the Gini index, classication error, and entropy. Hint: In a setting with two classes, p'ml : 1 + pm. You could make this plot by hand, but it will be much easier to make in R. 8. 2. 3 Boosting We now discuss boosting, yet another approach for improving the predic- tions resulting from a decision tree. Like bagging, boosting is a general approach that can be applied to many statistical learning methods for re- gression or classication. Here we restrict our discussion of boosting to the context of decision trees. Recall that bagging involves creating multiple copies of the original train- ing data set using the bootstrap, tting a separate decision tree to each copy, and then combining all of the trees in order to create a single predic- tive model. Notably, each tree is built on a bootstrap data set, independent of the other trees. Boosting works in a similar way, except that the trees are grown sequentially: each tree is grown using information from previously grown trees. Boosting does not involve bootstrap sampling; instead each tree is t on a modied version of the original data set. Consider rst the regression setting. Like bagging, boosting involves com- bining a large number of decision trees, f1, . . . , f3. Boosting is described in Algorithm 8.2. What is the idea behind this procedure? Unlike tting a single large deci- sion tree to the data, which amounts to tting the data. herd and potentially overtting, the boosting approach instead teams slowly. Given the current model, we t a decision tree to the residuals from the model. That is, we t a tree using the current residuals, rather than the outcome Y, as the re- sponse. We then add this new decision tree into the tted function in order to update the residuals. Each of these trees can be rather small, with just a few terminal nodes, determined by the parameter d in the algorithm. By boosting The null rate results from simply classifying each observation to the dominant class overall, which is in this case the normal class. 322 8. Tree-Based Methods m=p m=p/2 0.5 m=/p 0.4 Test Classification Error 0.3 0 100 200 300 400 500 Number of TreesAlgorithm 8.2 Boosting for Regression flees 1. Set at) : 0 and in; : ya: for all i in the training set. 2. For I? : 1,2j . . . . B, repeat: (a) Fit a tree fb with d splits {d + 1 terminal nodes) to the training data (X, r). (b) Update f by adding in a shrunken version of the new tree: e\")