Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

# Pima Dataset ## Preparation and EDA ### Question 1 Obtain the dataset In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back

# Pima Dataset

## Preparation and EDA

### Question 1

**Obtain the dataset**

In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back into one complete dataset, call it `pima`. (Try function `rbind()`.) How many observations are there?

### Question 2

**Summary**

Obtain some basic summary data for pima.

### Question 3

**Pairs**

Another quick EDA to perform, you can plot the `pairs()`. The plot function can handle both numerical and categorical variable type. After trying the function in the R base library, also try the modified version with `pairs.panels()`.

```{r}

loadPkg(psych)

pairs.panels(iris[,-5],

method = "pearson", # correlation method

hist.col = "#00AFBB", # set histogram color, can use "#22AFBB", "red",

density = TRUE,# show density plots

ellipses = TRUE # show correlation ellipses

)

unloadPkg(psych)

```

## KNN

### Question 4

**Train-Test split 3:1**

In order to perform KNN analysis, we need to separate the X-variables and the y-labels. (Which should be our y-variable?) Before we separate them out, create vector/array of 1 and 2 to create train-test split in the ratio of 3:1. (Set a constant seed value so that we can duplicate the results.) So eventually, you will get the training Xs as a dataframe, training y-label (a vector), as well as the test sets together in four groups. Make sure the train-X and train-y are not mixed up in the ordering during the process. Same for test-X and test-y.

### Question 5

**KNN results**

Perform the KNN analysis, with different k values. You do not need to show all the results from different k, but please include the one with the best (total) accuracy in your submission. How does the accuracy compared to the percentages of being T/F in the dataset?

## Logistic Regression and comparison

### Question 6

**Logistic Regression results**

Compare to the best logistic regression you can get. (Use the full model with all variables, since that is what we have for KNN.) How is the accuracy (assumes the standard cutoff of 0.5) compared to KNN?

### Question 7

**ROC-AUC**

What is the score for the logit model using ROC-AUC? We should be able to compute the ROC-AUC value for the KNN model the same way. Can you compare them?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Algebra A Text/Workbook

Beginning Algebra A Text/Workbook

Authors: Charles P McKeague

2nd Edition

1483271242, 9781483271248

More Books

Students also viewed these Mathematics questions

Question

Explain the difference between the stem and the leaf in a stem-and-leaf display.

Answered: 1 week ago

Question

★★★★★

In business circles, one frequently hears references to the old boy network. Many women in professional firms complain that their gender precludes them from becoming a member of the old boy network...

Answered: 1 week ago

Question

★★★★★

# Pima Dataset ## Preparation and EDA ### Question 1 **Obtain the dataset** In the `MASS` library, combine the two datasets `Pima.te` and `Pima.tr` back into one complete dataset, call it `pima`....

Answered: 1 week ago

Question

★★★★★

According to the case, the minimum efficient scale of production is 3% market share (75m Pounds). How many new product introductions would be necessary (on average) for a new branded entrant (say...

Answered: 1 week ago

Question

★★★★★

A hiring team is considering a candidate for a cloud security analyst position. Which part of the interview process assesses how well the candidate will fit in with the team's culture? One-on-one...

Answered: 1 week ago

Question

★★★★★

4. Consider a system composed of a manufacturing factory and a waste treatment plant owned by the manufacturer. The manufacturing plant produces finished goods that sell for a unit price of 10. Howeve

Answered: 1 week ago

Question

★★★★★

How to craft high impact digital advertising campaigns using semrush exam?

Answered: 1 week ago

Question

★★★★★

Loans to non-profit organizations represent a major category of the loans banks originate. Question 1 options: True False

Answered: 1 week ago

Question

★★★★★

Pharoah Company obtains $44,800 in cash by signing a 7%, 6-month, $44,800 note payable to First Bank on July 1. Pharoahs fiscal year ends on September 30. What information should be reported for the...

Answered: 1 week ago

Question

★★★★★

Teamwork. Divide the class into two-person teams. One person will be the interviewer at the campus radio station; the interviewee is the president of the local SIFE (Students in Free Enterprise)...

Answered: 1 week ago

Question

★★★★★

What is the difference between a scannable and an online rsum? (Objective 4)

Answered: 1 week ago

Question

★★★★★

Describe the chronological, functional, and combination rsum formats and when each type would be most appropriately used. (Objective 3)

Answered: 1 week ago

Previous Question Next Question