Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

# Pima Dataset ## Preparation and EDA ### Question 1 **Obtain the dataset** In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back

# Pima Dataset

## Preparation and EDA

### Question 1

**Obtain the dataset**

In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back into one complete dataset, call it `pima`. (Try function `rbind()`.) How many observations are there?

### Question 2

**Summary**

Obtain some basic summary data for pima.

### Question 3

**Pairs**

Another quick EDA to perform, you can plot the `pairs()`. The plot function can handle both numerical and categorical variable type. After trying the function in the R base library, also try the modified version with `pairs.panels()`.

```{r}

loadPkg(psych)

pairs.panels(iris[,-5],

method = "pearson", # correlation method

hist.col = "#00AFBB", # set histogram color, can use "#22AFBB", "red",

density = TRUE,# show density plots

ellipses = TRUE # show correlation ellipses

)

unloadPkg(psych)

```

## KNN

### Question 4

**Train-Test split 3:1**

In order to perform KNN analysis, we need to separate the X-variables and the y-labels. (Which should be our y-variable?) Before we separate them out, create vector/array of 1 and 2 to create train-test split in the ratio of 3:1. (Set a constant seed value so that we can duplicate the results.) So eventually, you will get the training Xs as a dataframe, training y-label (a vector), as well as the test sets together in four groups. Make sure the train-X and train-y are not mixed up in the ordering during the process. Same for test-X and test-y.

### Question 5

**KNN results**

Perform the KNN analysis, with different k values. You do not need to show all the results from different k, but please include the one with the best (total) accuracy in your submission. How does the accuracy compared to the percentages of being T/F in the dataset?

## Logistic Regression and comparison

### Question 6

**Logistic Regression results**

Compare to the best logistic regression you can get. (Use the full model with all variables, since that is what we have for KNN.) How is the accuracy (assumes the standard cutoff of 0.5) compared to KNN?

### Question 7

**ROC-AUC**

What is the score for the logit model using ROC-AUC? We should be able to compute the ROC-AUC value for the KNN model the same way. Can you compare them?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Algebra A Text/Workbook

Authors: Charles P McKeague

2nd Edition

1483271242, 9781483271248

More Books

Students also viewed these Mathematics questions