Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

How to fix the attached error. Using the auto data set and using the scikit learn library 2 . Create and add a binary variable

How to fix the attached error.
Using the auto data set and using the scikit learn library
2. Create and add a binary variable column called mpg_high_low to the dataset that is set to High if mpg is a value above 30, and a Low if mpg is a value less than or equal to 30. Make sure the mpg_high_low column is of type category.
3. Check if the auto data is imbalanced with respect to mpg_high_low. Report the percentage of the data that belong to the two classes (High and Low).
4. Split the dataset into 75% training and 25% test and use 10 fold cross validation for the models below
5. Fit a logistic regression model to the training set to predict mpg_high_low using all the other features/variables except mpg, year, origin, and name. Predict the mpg_high_low using the test dataset and report the Accuracy, Precision, Recall, Specificity, and F1 measure.
6. Alter the threshold for classifying a Low to 0.6 and report the changes in the test performance metrics from those reported in Qn 5.
7. Find the optimal threshold by drawing the ROC curve. Change the threshold to the optimal value you found from the ROC curve and report the changes in the test performance metrics from those reported in Qn 5.
8. Fit a Nave Bayes model to the training data to predict mpg_high_low using all the other features/variables except mpg, year, origin, and name. Predict the mpg_high_low using the test dataset. Plot the ROC curve and report the best threshold on the ROC curve plot. Report the AUC on the curve plot as well. Report the accuracy, precision, recall, specificity and F1 score.
9. Fit a KNN model to the training data to predict mpg_high_low using all the other features/variables except mpg, year, origin, and name. Use a grid search between 3 and 10 to find the best value of k. Report the accuracy, precision, recall, specificity, F1 score and AUC.
10. Fit a LDA model to the training data to predict mpg_high_low using all the other features/variables except mpg, year, origin, and name. Report the accuracy, precision, recall, specificity and F1 score.
11. Summarize the performance of the all the above models by creating a dataframe with 4 columns Model_Name, Accuracy, Precision, Recall, Specificity, F1 Score. The data frame should contain one row for each model you built above with each of the columns filled in with the appropriate metric. Print out the dataframe. Which model performed the best from an accuracy point of view and which model performed best from a recall point of view without adjusting for the threshold?Alter the threshold for classifying a Low to 0.6 and report the changes in the test performance metrics from those reported in On 5.
O I Alar the threbheld for classlfyling a Lou to e.6
threshols = e.s
prist("Mioglstic Ragresiton with Thwobheld of 0.627)
print ("Accuracy:", Asc_log_rez.thresh)
oriat("Prectisioni", prec_leg_ref_thresh)
print ("Recall:", rec_log.reg, threnh)
DypeError Tracabuck (nowt recent call last),
Alter the threshold fer classifying a tew to 6.0
Dbefrnor: dats 'vpe 'catesiry' not inserateas
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Microsoft SQL Server 2012 Administration

Authors: Adam Jorgensen, Steven Wort

1st Edition

1118106881, 9781118106884

More Books

Students also viewed these Databases questions