Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

THIS IS FOR R. Start with a new Rstudio Rmd file, add headings for Homework 3, your name and a brief description of the purpose

THIS IS FOR R.

Start with a new Rstudio Rmd file, add headings for Homework 3, your name and a brief description of the purpose of the script. Clearly label each step. Each step should have one or more R code chunks. For step 1, load library mlbench, installing if needed (at the console). You have to load the data frame into memory with data(BreastCancer) Now run str() and head() on BreastCancer and summary() on just the Class column. Use R instructions to calculate the percent in each class, and print them with an appropriate heading using paste().

Cell.size and Cell.shape are in one of 10 levels. Build a logistic regression model called glm0, where Class is predicted by Cell.size and Cell.shape. Do you get any error or warning messages?. Run summary on glm0 to confirm that it did build a model. Write a comment about why you think you got this warning message and what you could possibly do about it.

Notice in the summary() of glm0 that most of the levels of Cell.size and Cell.shape became predictors and that they had very high p-values. We wont be able to build a good logistic regression model this way. It might be better to just have 2 levels for each variable. In this step, add two new columns to BreastCancer as listed below. Run summary() on Cell.size and Cell.shape as well as the new columns. Comment on the distribution of the new columns. Do you think what we did is a good idea? Why or why not?

New Columns: Cell.small which is a binary factor that is 1 if Cell.size==1 and 0 otherwise. Cell.regular which is a binary factor that is 1 if Cell.shape==1 and 0 otherwise.

Create conditional density plots using the original Cell.size and Cell.shape. First attach() the data to reduce typing. Then use par(mfrow=c(1,2)) to set up a 1x2 grid for two cdplot() graphs with Class~Cell.size and Class~Cell.shape. Observing the plots, write a sentence or two comparing size and malignant, and shape and malignant. Do you think our cutoff points for size==1 and shape==1 were justified now that you see this graph? Why or why not?

Create plots (not cdplots) with our new columns. Again, use par(mfrow=c(1,2)) to set up a 1x2 grid for two plot() graphs with Class~Cell.small and Class~Cell.regular. Now create two cdplot() graphs for the new columns. Now compute the following and provide a summary in the text portion of this answer. Also indicate based on these results if you think small and regular will be good predictors. Calculate the percentage of small observations that are malignant. Calculate the percentage of not-small observations that are malignant. Calculate the percentage of regular observations that are malignant. Calculate the percentage of non-regular observations that are malignant.

Randomly divide BreastCancer into two data sets: train (80% of the data) and test (20%). Make sure you first set the seed to 1234 so you get the same results as others.

Build a logistic regression classifier to estimate the probability of Class given Cell.small and Cell.regular. Run summary() on your model.

Test the model on the test data and compute accuracy. What percent accuracy did you get?

Your coefficients from the model are in units of logits. Extract the coefficient of small with glm1$coefficients[]. What is the coefficient? How do you interpret this value? Find the estimated probability of malignancy if Cell.small is true using exp(). Find the probability of malignancy if Cell.small is true over the whole BreastCancer data set and compare results. Are they close? Why or why not?

Build two more models, each just using Cell.small and Cell.regular and use anova(glm_small, glm_regular, glm1) to compare all 3 models, using whatever names you used for your models. Analyze the results of the anova(). Also, compare the 3 AIC scores of the models. Feel free to use the internet to help you interpret AIC scores.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

MySQL Crash Course A Hands On Introduction To Database Development

Authors: Rick Silva

1st Edition

1718503008, 978-1718503007

More Books

Students also viewed these Databases questions

Question

Why do HCMSs exist? Do they change over time?

Answered: 1 week ago