Question
# Pima Dataset ## Preparation and EDA ### Question 1 **Obtain the dataset** In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back
# Pima Dataset
## Preparation and EDA
### Question 1
**Obtain the dataset**
In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back into one complete dataset, call it `pima`. (Try function `rbind()`.) How many observations are there?
### Question 2
**Summary**
Obtain some basic summary data for pima.
### Question 3
**Pairs**
Another quick EDA to perform, you can plot the `pairs()`. The plot function can handle both numerical and categorical variable type. After trying the function in the R base library, also try the modified version with `pairs.panels()`.
```{r}
loadPkg(psych)
pairs.panels(iris[,-5],
method = "pearson", # correlation method
hist.col = "#00AFBB", # set histogram color, can use "#22AFBB", "red",
density = TRUE,# show density plots
ellipses = TRUE # show correlation ellipses
)
unloadPkg(psych)
```
## KNN
### Question 4
**Train-Test split 3:1**
In order to perform KNN analysis, we need to separate the X-variables and the y-labels. (Which should be our y-variable?) Before we separate them out, create vector/array of 1 and 2 to create train-test split in the ratio of 3:1. (Set a constant seed value so that we can duplicate the results.) So eventually, you will get the training Xs as a dataframe, training y-label (a vector), as well as the test sets together in four groups. Make sure the train-X and train-y are not mixed up in the ordering during the process. Same for test-X and test-y.
### Question 5
**KNN results**
Perform the KNN analysis, with different k values. You do not need to show all the results from different k, but please include the one with the best (total) accuracy in your submission. How does the accuracy compared to the percentages of being T/F in the dataset?
## Logistic Regression and comparison
### Question 6
**Logistic Regression results**
Compare to the best logistic regression you can get. (Use the full model with all variables, since that is what we have for KNN.) How is the accuracy (assumes the standard cutoff of 0.5) compared to KNN?
### Question 7
**ROC-AUC**
What is the score for the logit model using ROC-AUC? We should be able to compute the ROC-AUC value for the KNN model the same way. Can you compare them?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started