Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

AdaBoost [25 points] For this problem, you need to download the Bupa Liver Disorder dataset that is available on the course web- site. The description

AdaBoost [25 points]

For this problem, you need to download the Bupa Liver Disorder dataset that is available on the course web- site. The description of this dataset is in https://archive.ics.uci.edu/ml/datasets/liver+disorders Here, you will predict whether an individual has a liver disorder (indicated by the selector feature) based on the results of a number of blood tests and levels of alcohol consumption. Implement the AdaBoost algorithm using a decision stump as the weak classifier. You should submit the code electronically to iCollege.

AdaBoost trains a sequence of classifiers. Each classifier is trained on the same set of training data (xi, yi), i = 1, ..., m, but with the significance Dt(i) of each example {xi, yi} weighted differently. At each iteration, a classifier, ht (x) {1, 1}, is trained to minimize the weighted classification error, Pmi=1 Dt (i) I(ht(xi) = yi), where I is the indicator function (0 if the predicted and actual labels match, and 1 otherwise).

The overall prediction of the AdaBoost algorithm is a linear combination of these classifiers, HT (x) = sign(PTt=1 tht(x)).

A decision stump is a decision tree with a single node (a depth 1 decision tree). It corresponds to a single threshold in one of the features and predicts the class for examples falling above and below the threshold respectively, ht(x) = C1I(xj c) + C2I(xj < c), where xj is the j th component of the feature vector x. Unlike in class, where we split on Information Gain, for this algorithm split the data based on the weighted classification accuracy described above, and find the class assignments C1,C2 {1,1}, threshold c, and feature choice j that maximizes this accuracy.

1. (10 points) Using all of the data for training, display the selected feature component j, threshold c, and class label C1 of the decision stump ht(x) used in each of the first 10 boosting iterations (t = 1, 2, ..., 10).

2. (15 points) Use 90% of the dataset for training and 10% for testing. Average your results over 50 random splits of the data into training sets and test sets. Limit the number of boosting iterations to 100. In a single plot show:

average training error after each boosting iteration

average test error after each boosting iteration

Please answer correctly i will upvote!!

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke, David J. Auer

3rd Edition

0131986252, 978-0131986251

More Books

Students also viewed these Databases questions

Question

LO2 Distinguish among three types of performance information.

Answered: 1 week ago