Question

1 Approved Answer

Posted on Sep 26, 2024

Part 2: Auto dataset revisited We also used the auto dataset two weeks ago in lab 6. We used it with LDA and QDA. Both

Part 2: Auto dataset revisited

We also used the auto dataset two weeks ago in lab 6. We used it with LDA and QDA. Both methods in R provide a CV argument that will compute a LOOCV estimate for us. If we want to compute a k-fold cross validation estimate when k is not equal to the number of instances, we have to either write our own code or find another library to use. Here we will write our own code! Write a function that accepts a dataframe, a model-building function (either lda or qda), and a value for K and returns an error estimate and its variance for k-fold cross validation. Use this function to generate values for the same kind of table you made in part 1. Compare these values to using the training set and a validation set to estimate the error rates, too. Finally, include a paragraph summarizing and explaining the results just as you did in part 1.

Below is the code in R markdown with the auto data, the training and testing split, and with the Linear Discriminant Analysis (LDA) and Quadratric Discriminant Analysis. and

image text in transcribed

Section 2: Auto dataset This one is straight out of the textbook. It is problem 11 from Chapter 4. It is copy/pasted below: In this problem, you will develop a model to predict whether a given car gets high or low gas mileage based on the Auto data set. library (GGally) ## Registered s3 method overwritten by 'Gally': method from ggplot2 ## ## Attaching package: 'Gally' ## ## +.88 4 ## The following object is masked from 'package:dplyr': ## ## nasa library(ISLR) data (Auto) (a) Create a binary variable, mpg01, that contains a 1 if mpg contains a value above its median, and a 0 if mpg contains a value below its median. You can compute the median using the median() function. Note you may find it helpful to use the data.frame() function to create a single data set containing both mpg01 and the other Auto variables. Kimmer's comment: or just use dplyr! mpg.med % mutate (mpg01 = ifelse (mpg > mpg.med, 1, 0)) %>% select (-mpg) #Auto$mpg01 as. numeric(Auto$mpg > mpg.med) # also works! (c) Split the data into a training set and a test set. auto.dfs - list() Auto.new % mutate (mpg01 = ifelse (mpg > mpg.med, 1, 0)) %>% select (-mpg) #Auto$mpg01 as. numeric(Auto$mpg > mpg.med) # also works! (c) Split the data into a training set and a test set. auto.dfs - list() Auto.new