Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

### USE R STUDIO ##Load the libraries you need for data loading, machine learning, plotting, and outputting plots ##Use the (.packages()) command to output your

### USE R STUDIO

##Load the libraries you need for data loading, machine learning, plotting, and outputting plots

##Use the (.packages()) command to output your list of packages loaded. Copy and paste this output into your submission file.

##Load the iris dataset and save it as a variable

##The goal of this dataset is to use measured parameters of iris flowers to predict their species

##In this practice you will divide the data into training and testing sets, train models, make predictions

##and identify the best model for predicting iris species.

##Divide the data into training and testing sets with 85% of the data in training and 15% in testing

##Use the dim() and table([dataset]$[label column]) functions to validate your separation. Paste the output of these commands

##on the training and testing data in your submission file

##Train a decision tree, a k nearest neighbor, and a logistic regression algorithm using caret

##Use the following trainControl function:

cvControl = trainControl(method="cv", number=10, summaryFunction=multiClassSummary,classProbs=TRUE)

##In your caret function use metric = "Accuracy", ignore the following warning message:

# Warning message:

# In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :

# There were missing values in resampled performance measures.

##Copy and paste the output of [model name]$finalModel for each model into your submission file

##Make predictions of the test data for each algorithm and calculate the Accuracy for each algorithm using the

##pROC package.

#Hint: you can find the accuracy as follows.

#sum(knnPreds==iris_test$Species)/nrow(iris_test)

##Copy and paste the Accuracy of each algorithm into your submission file.

##In one sentence describe which algorithm you would choose for this task and why you would choose it.

##Include this sentence in your submission file

##In this next section we will use a dataset with cell measurements to predict breast cancer

##Install the mlbench package

##Load the mlbench package

##Load the BreastCancer dataset

##This dataset contains a cell ID (not used for classification), Cell measurements, and a

##Class of tumor (benign or melignant). The class is our label or outcome variable (i.e. our y)

##All other measures can be used in classification.

##Separate the data into training and testing sets with 90% of the data in the training set

##Train a random forest, neural network, Linear discriminant, and a decision tree algorithm

##Use the following trainControl:

cvControlBin = trainControl(method="cv", number=10, summaryFunction=twoClassSummary,classProbs=TRUE)

##Make predictions on the test data with each algorithm and calculate the AUC.

##Plot an ROC curve for each algorithm by modifying the following code:

##sets the colors of the lines

cols=c('#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00')

##outputs the roc curves as a plot

plot(DTroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[1])

plot(btroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[2],add=T)

plot(rfroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[3],add=T)

plot(ldaroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[4],add=T)

legend(0.475,0.6,legend=c(paste("DT: ",round(as.numeric(DTroc$ci)[2],2),sep=""),

paste("BT: ",round(as.numeric(btroc$ci)[2],2),sep=""),

paste("RF: ",round(as.numeric(rfroc$ci)[2],2),sep=""),

paste("LDA: ", round(as.numeric(ldaroc$ci)[2],2),sep="")

),bty="n",

col=cols,lwd=2)

axis(1, at=seq(1,0,by=-0.2), labels=c("0.0","0.2","0.4","0.6","0.8","1.0"),pos=-0.04)

##Output your ROC curves plot as a png image.

##In your submission file add a table showing the AUC for each algorithm. Add the plot with a proper figure label.

##In one sentence describe which algorithm you would prefer and why?

##In another sentence, if you were given the option to choose between the random forest algorithm and the decision tree

##what are some qualitative reasons you may choose one over the other?

##Add the answers to both of these questions to your submission document and submit it to eCampus

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Visual Basic 4 Ole Database And Controls Superbible

Authors: Michael Hatmaker, C. Woody Butler, Ibrahim Malluf, Bill Potter

1st Edition

1571690077, 978-1571690074

More Books

Students also viewed these Databases questions