Question
### USE R STUDIO ##Load the libraries you need for data loading, machine learning, plotting, and outputting plots ##Use the (.packages()) command to output your
### USE R STUDIO
##Load the libraries you need for data loading, machine learning, plotting, and outputting plots
##Use the (.packages()) command to output your list of packages loaded. Copy and paste this output into your submission file.
##Load the iris dataset and save it as a variable
##The goal of this dataset is to use measured parameters of iris flowers to predict their species
##In this practice you will divide the data into training and testing sets, train models, make predictions
##and identify the best model for predicting iris species.
##Divide the data into training and testing sets with 85% of the data in training and 15% in testing
##Use the dim() and table([dataset]$[label column]) functions to validate your separation. Paste the output of these commands
##on the training and testing data in your submission file
##Train a decision tree, a k nearest neighbor, and a logistic regression algorithm using caret
##Use the following trainControl function:
cvControl = trainControl(method="cv", number=10, summaryFunction=multiClassSummary,classProbs=TRUE)
##In your caret function use metric = "Accuracy", ignore the following warning message:
# Warning message:
# In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
# There were missing values in resampled performance measures.
##Copy and paste the output of [model name]$finalModel for each model into your submission file
##Make predictions of the test data for each algorithm and calculate the Accuracy for each algorithm using the
##pROC package.
#Hint: you can find the accuracy as follows.
#sum(knnPreds==iris_test$Species)/nrow(iris_test)
##Copy and paste the Accuracy of each algorithm into your submission file.
##In one sentence describe which algorithm you would choose for this task and why you would choose it.
##Include this sentence in your submission file
##In this next section we will use a dataset with cell measurements to predict breast cancer
##Install the mlbench package
##Load the mlbench package
##Load the BreastCancer dataset
##This dataset contains a cell ID (not used for classification), Cell measurements, and a
##Class of tumor (benign or melignant). The class is our label or outcome variable (i.e. our y)
##All other measures can be used in classification.
##Separate the data into training and testing sets with 90% of the data in the training set
##Train a random forest, neural network, Linear discriminant, and a decision tree algorithm
##Use the following trainControl:
cvControlBin = trainControl(method="cv", number=10, summaryFunction=twoClassSummary,classProbs=TRUE)
##Make predictions on the test data with each algorithm and calculate the AUC.
##Plot an ROC curve for each algorithm by modifying the following code:
##sets the colors of the lines
cols=c('#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00')
##outputs the roc curves as a plot
plot(DTroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[1])
plot(btroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[2],add=T)
plot(rfroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[3],add=T)
plot(ldaroc, print.auc=F, main="",xaxt="n",ylab="True Positive Rate", xlab="False Positive Rate",col=cols[4],add=T)
legend(0.475,0.6,legend=c(paste("DT: ",round(as.numeric(DTroc$ci)[2],2),sep=""),
paste("BT: ",round(as.numeric(btroc$ci)[2],2),sep=""),
paste("RF: ",round(as.numeric(rfroc$ci)[2],2),sep=""),
paste("LDA: ", round(as.numeric(ldaroc$ci)[2],2),sep="")
),bty="n",
col=cols,lwd=2)
axis(1, at=seq(1,0,by=-0.2), labels=c("0.0","0.2","0.4","0.6","0.8","1.0"),pos=-0.04)
##Output your ROC curves plot as a png image.
##In your submission file add a table showing the AUC for each algorithm. Add the plot with a proper figure label.
##In one sentence describe which algorithm you would prefer and why?
##In another sentence, if you were given the option to choose between the random forest algorithm and the decision tree
##what are some qualitative reasons you may choose one over the other?
##Add the answers to both of these questions to your submission document and submit it to eCampus
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started