Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Assignment 3 Due Date: Sunday, October 2, 2022 The total number of points for this assignment is 60 points. Please submit your assignment in
Assignment 3 Due Date: Sunday, October 2, 2022 The total number of points for this assignment is 60 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your assignment submission. Keep the problem descriptions and insert your answers after each question. Please name your assignment with this format: Lastname.Firstname.Assignment3. 1. (15 points) Download the Boston Housing2.xls file (which has been used in Assignment 2). The target attribute in this dataset is CATMEDV (which is a binary attribute converted from MEDV in the Boston Housing.xls file). a. Within Excel, save the FullData sheet (with 506 records) as a CSV file, as you did for Assignment 2. Run Weka's support vector machines algorithm (SMO) on this data file, with 10-fold cross-validation. First, use the default parameter C = 1. Then, change C value to 10 and 100 in sequence. Show the output screens that display the 10-fold cross-validation error rates in these three cases. How does the error rate change as the C value increases? b. Based on the results with C = 100, what two attributes are the most important predictors? Explain the impact of these two predictors on classification in terms of how classification result will change when the value of a predictor increases or decreases. 2. (25 points) Apply (i) decision trees (J48), (ii) Nave Bayes, (iii) k-NN (k = 1), and (iv) SVM (SMO) in Weka for classifying the Boston Housing2 data used in Problem 1. Evaluate the performances of these four classification models based on (1) the overall classification accuracy, and (2) the ROC curve and AUC value by considering homes with 'high' value as the positive class. The specific steps and questions for this problem are: a. Run the four classification models in Weka on the data using the default settings (10-fold cross-validation, etc.). For each model, show two output screens: the first displays the 10-fold cross-validation error rates and the confusion matrix; the second displays the ROC curve (for your reference, see the output screens shown in the "Plotting ROC Curve in Weka" section of the lecture notes titled "Model and Performance Evaluation"). In sum, there are eight output screens, two for each classification model. b. Based on the overall classification accuracy, rank the four models from the best to the worst. c. Suppose you are only interested in accurately predicting/identifying high-value homes (so that the 'high' class is the positive class). In this case, how do you rank the four models from the best to the worst? Justify your answers with the relevant results from the Weka output.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started