Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Assignment 3 Due Date: Sunday, 6/28/2020 The total number of points for this assignment is 60 points. Please submit your assignment in a Word file.

Assignment 3

Due Date: Sunday, 6/28/2020

The total number of points for this assignment is 60 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your assignment submission. Keep the problem descriptions and insert your answers after each question. Please name your assignment with this format: Lastname.Firstname.Assignment3.

1.(20 points) Download the BostonHousing2.xls file (which has been used in Assignment 2). The target attribute in this dataset is CATMEDV (which is a binary attribute converted from MEDV in the BostonHousing.xls file).

a.Within Excel, save the FullData sheet as a .CSV file, as you did for Assignment 2. Run Weka's support vector machines algorithm (SMO) on this data file, with 10-fold cross-validation. First, use the default parameter C = 1. Then, change C value to 10 and 100 in sequence. Show the output screens that display the 10-fold cross-validation error rates in these three cases. How does the error rate change as the C value increases?

b.Based on the results with C=100, what two attributes are the most important predictors? Explain the impact of these two predictors on classification in terms of how classification result will change when the value of a predictor increases or decreases.

c.Run the SVM algorithm in Rattle on the same data, using a 70/30 partition, Linear (vanilladot) kernel, and C=100. Show two output screens, one from the Model section and the other from the Evaluate section with testing error rate and error matrix.

2.(20 points) Apply (i) decision trees (J48), (ii) Nave Bayes, (iii) k-NN (k=1), and (iv) SVM (SMO) in Weka for classifying the BostonHousing2 data used in Problem 1. Evaluate the performances of these four classification models based on (1) the overall classification accuracy, and (2) the ROC curve and AUC value by considering high-value homes as positive. The specific steps and questions for this problem are:

a.Run the four classification models in Weka on the data using the default settings (10-fold cross-validation, etc.). For each model, show two output screens: the first displays the 10-fold cross-validation error rates and the confusion matrix; the second displays the ROC curve (for your reference, see the output screens shown in the "Plotting ROC Curve in Weka" section of the lecture notes titled "Model and Performance Evaluation"). In sum, there are eight output screens, two for each classification model.

b.Based on the overall classification accuracy, rank the four models from the best to the worst.

c.Suppose you are only interested in accurately predicting/identifying high-value homes (so that the 'high' class is the positive class). In this case, how do you rank the four models from the best to the worst? Justify your answers with the relevant results from the Weka output.

3.(20 points) Download the BostonHousing.xls file (which has been used in Assignment 1). The target attribute in this dataset is MEDV (numeric). Delete the CAT.MEDV attribute (which is a binary attribute converted from MEDV) and save the data to a CSV file, as you did for Assignment 1.

a.Run Weka's LinearRegression algorithm with the default parameters and 10-fold cross-validation. Show the output screen with Linear Regression Model and the Cross-Validation Summary section with error results.

b.Run Weka's SVR algorithm (SMOreg) on this data. Set error margin parameter (epsilonParameter) = 0.1. Keep the other default parameters unchanged. Show the output screen with SVR model and the Cross-Validation Summary section with error results. Compare and comment the performance of SVR and that of linear regression in part (a), based on the results of 'Mean absolute error' and 'Root mean squared error'.

c.What two attributes are the most important predictors based on the SVR model? Are they consistent with those identified in Problem 1b based on the SVM model?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Geometry, Its Elements And Structure

Authors: Alfred S Posamentier, Robert L Bannister

2nd Edition

0486782166, 9780486782164

More Books

Students also viewed these Mathematics questions