Question
AdaBoost [25 points] For this problem, you need to download the Bupa Liver Disorder dataset that is available on the course web- site. The description
AdaBoost [25 points]
For this problem, you need to download the Bupa Liver Disorder dataset that is available on the course web- site. The description of this dataset is in https://archive.ics.uci.edu/ml/datasets/liver+disorders Here, you will predict whether an individual has a liver disorder (indicated by the selector feature) based on the results of a number of blood tests and levels of alcohol consumption. Implement the AdaBoost algorithm using a decision stump as the weak classifier. You should submit the code electronically to iCollege.
AdaBoost trains a sequence of classifiers. Each classifier is trained on the same set of training data (xi, yi), i = 1, ..., m, but with the significance Dt(i) of each example {xi, yi} weighted differently. At each iteration, a classifier, ht (x) {1, 1}, is trained to minimize the weighted classification error, Pmi=1 Dt (i) I(ht(xi) = yi), where I is the indicator function (0 if the predicted and actual labels match, and 1 otherwise).
The overall prediction of the AdaBoost algorithm is a linear combination of these classifiers, HT (x) = sign(PTt=1 tht(x)).
A decision stump is a decision tree with a single node (a depth 1 decision tree). It corresponds to a single threshold in one of the features and predicts the class for examples falling above and below the threshold respectively, ht(x) = C1I(xj c) + C2I(xj < c), where xj is the j th component of the feature vector x. Unlike in class, where we split on Information Gain, for this algorithm split the data based on the weighted classification accuracy described above, and find the class assignments C1,C2 {1,1}, threshold c, and feature choice j that maximizes this accuracy.
1. (10 points) Using all of the data for training, display the selected feature component j, threshold c, and class label C1 of the decision stump ht(x) used in each of the first 10 boosting iterations (t = 1, 2, ..., 10).
2. (15 points) Use 90% of the dataset for training and 10% for testing. Average your results over 50 random splits of the data into training sets and test sets. Limit the number of boosting iterations to 100. In a single plot show:
average training error after each boosting iteration
average test error after each boosting iteration
Please answer correctly i will upvote!!
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started