Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Use RStudio to add the data in grey then questions below that. We will be modeling whether or not respondents purchased a brand of cereal.

Use RStudio to add the data in grey then questions below that.

We will be modeling whether or not respondents purchased a brand of cereal. The variables are Bought (0=no, 1=yes), Income (in tens of thousands of dollars), Children (under 18 in the home, Yes or No), ViewAd (did the respondent see the ad for the brand, Yes or No), and Age (years). The data file is located on the Module 9: Class Session Agenda and Prep page (CerealPurchase.xlsx)

Within the code below are comments in all capital letters that are the questions you are to answer This is the .R Code below:

 #logistic Regression assignment #Install packages if needed install.packages("readxl") install.packages("ggplotgui") install.packages("ggplot2") install.packages("shiny") install.packages("lmtest") install.packages("aod") install.packages("pROC") install.packages("psych") #load packages if needed library(readxl) library(ggplotgui) library(ggplot2) library(shiny) library(lmtest) library(aod) library(pROC) install.packages("psych") #get data CerealPurchase <- read_excel(file.choose()) names(CerealPurchase) #Binary Log--predict Bought from all the other variables LogMod <- glm(Bought~Income+Children+ViewAd+Age, data=CerealPurchase, family="binomial") summary(LogMod) #test the whole model lrtest(LogMod) #WHAT DO THE RESULTS TELL YOU? #now run a mixed step-wise analysis to develop a more parsimonious model #mixed step-wise regression #define intercept-only model intercept_only <- glm(Bought~1,  data=CerealPurchase, family="binomial") #define model with all predictors all <- glm(Bought~Income + Children + ViewAd + Age,  data=CerealPurchase, family="binomial") summary(all) #use them to perform a mixed step-wise regression both <- step(intercept_only, direction='both', scope=formula(all), trace=0) #view results of mixed stepwise regression both$anova summary(both) #WHAT IVs ARE SIGNIFICANT FROM THE MIXED STEP-WISE MODEL? #HOW DO YOU EXPLAIN ANY DIFFERENCES BETWEEN THE "ALL-IN" AND MIXED STEP-WISE MODELS? ##Odds of buying cereal---- #logits coef(both) #Unit Odds Ratios of buying the cereal compared to not buying exp(coef(both)) #WHAT DO THE UNIT ODDS RATIOS MEAN? #WRITE THE CODE TO CONVERT THOSE UNIT ODDS RATIOS TO PROBABILITIES #complete the next line of code to estimate for a respondent who is 33 years old, no children, and saw the ad. Remember that character values need to be enclosed in quotation marks, but that numbers are not. I1 <- data.frame(Children = "", ViewAd = "", Age = ) #estimate #fill-in between the parentheses to have R calculate the prediction for this person (you can see what needs to go here by looking at the code from the recorded class session, or from the modules on simple and multiple regression g <- predict( ) #CONVERT THE ESTIMATE TO ODDS #CONVERT THE ESTIMATE TO PROBABILITY #DO YOU PREDICT THAT THIS PERSON WOULD BUY THE CEREAL? YES/NO #how well does the model perform? #Estimated Probability of prob sleeping CerealPurchase$Buy <- predict(both, type="response") CerealPurchase$PredBuy <- 0 CerealPurchase$PredBuy[CerealPurchase$Buy >= .5] <- 1 #confusion matrix xtab <- table(CerealPurchase$Bought,CerealPurchase$PredBuy) xtab # percent improvement #number wrong with no model wrongNM <- sum(xtab)-max(rowSums(xtab)) #number wrong with model wrongWM <- sum(xtab)-sum(diag(xtab)) #percent improvement in error rate (wrongNM-wrongWM)/wrongNM #WHAT IS THE PERCENT IMPROVEMENT FOR THIS MODEL? IS THAT GOOD OR BAD? #kappa cereal_kappa <- cohen.kappa(xtab) cereal_kappa$kappa #IS THIS KAPPA GOOD OR BAD? #range odds min(CerealPurchase$Age) max(CerealPurchase$Age) IMinAge <- data.frame(Age=29, Children = "No", ViewAd = "No") IMaxAge <- data.frame(Age=53, Children = "No", ViewAd = "No") OddsMinAge <- exp(predict(both, IMinAge)) OddsMinAge OddsMaxAge <- exp(predict(both, IMaxAge)) OddsMaxAge RA_Age <- OddsMaxAge/OddsMinAge #the range odds ratio for Age IMinChild <- data.frame(Age=37, Children = "No", ViewAd = "No") IMaxChild <- data.frame(Age=37, Children = "Yes", ViewAd = "No") OddsMinChild <- exp(predict(both, IMinChild)) OddsMinChild OddsMaxChild <- exp(predict(both, IMaxChild)) OddsMaxChild RA_Child <- OddsMaxChild/OddsMinChild #the range odds ratio for Children IMinAd <- data.frame(Age=37, Children = "No", ViewAd = "No") IMaxAd <- data.frame(Age=37, Children = "No", ViewAd = "Yes") OddsMinAd <- exp(predict(both, IMinAd)) OddsMinAd OddsMaxAd <- exp(predict(both, IMaxAd)) OddsMaxAd RA_Ad <- OddsMaxAd/OddsMinAd #the range odds ratio for ViewAd RA <- cbind(RA_Age, RA_Child, RA_Ad) coef(both) RA #WHAT DOES THE RANGE ODDS REPRESENT FOR EACH VARIABLE?

This is the Data File to use:

Bought Income Children View Ad Age
0 37 No No 42
0 47 Yes No 43
0 49 No No 49
0 13 No Yes 39
0 51 Yes No 49
0 38 Yes No 41
0 60 Yes Yes 44
0 17 Yes No 44
0 60 No No 36
0 38 Yes Yes 44
0 24 Yes No 47
0 15 Yes No 44
0 28 Yes No 43
0 36 Yes No 42
0 10 No Yes 39
0 46 No No 38
0 37 Yes Yes 47
0 55 Yes No 40
0 24 Yes No 37
0 19 Yes No 45
0 55 Yes Yes 46
0 53 Yes Yes 42
0 30 No Yes 37
0 53 Yes No 53
0 60 Yes Yes 42
0 53 Yes Yes 37
0 43 Yes No 40
0 35 Yes Yes 36
0 33 No Yes 40
0 30 Yes No 46
0 34 Yes No 50
0 43 No No 41
0 36 Yes No 46
0 60 Yes Yes 47
0 56 Yes No 44
0 60 Yes Yes 47
0 33 Yes No 45
0 20 No No 41
0 15 No Yes 47
0 10 No No 37
0 25 No No 46
0 15 Yes Yes 40
0 10 No No 42
0 15 No Yes 42
0 24 Yes No 51
0 15 No Yes 43
0 24 Yes Yes 45
0 21 No No 47
0 21 No No 39
1 47 Yes Yes 36
1 59 Yes Yes 43
1 48 Yes Yes 35
1 59 Yes No 40
1 55 Yes Yes 41
1 13 Yes Yes 46
1 34 Yes No 34
1 57 Yes Yes 43
1 51 Yes Yes 42
1 23 Yes Yes 37
1 44 Yes Yes 34
1 11 Yes No 35
1 22 No No 45
1 45 Yes Yes 30
1 55 Yes No 35
1 60 Yes Yes 41
1 44 Yes Yes 43
1 38 Yes Yes 39
1 42 Yes Yes 46
1 33 Yes No 43
1 52 No No 29
1 30 Yes Yes 39

These are the 8 questions for the Assignment below:

Question 1

What do the results tell you?

In the "all-in" model (with all the ivs entered together), is the model overall significant?

Which ivs are significant?

Question 2

#what ivs are significant from the mixed step-wise model?

#how do you explain any differences between the "all-in" and mixed step-wise models

Question 3

#what do the unit odds ratios mean? In other words, how do you interpret them?

Question 4

#write the code to convert those unit odds ratios to probabilities

Question 5

#convert the estimate to odds

#convert the estimate to probability

#do you predict that this person would buy the cereal? Yes/no

What are this person's odds?

What is their probability of buying the cereal?

Do you predict they will buy the cereal?

Question 6

#what is the percent improvement for this model?

Is that good or bad?

Question 7

#is this kappa good or bad?

Question 8

#what does the range odds represent for each variable?

In other words, what the range odd ratios mean?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Statistical Inference

Authors: George Casella, Roger L. Berger

2nd edition

0534243126, 978-0534243128

More Books

Students also viewed these Mathematics questions

Question

What is at the core of Samsungs growth and success?

Answered: 1 week ago