Question
Use RStudio to add the data in grey then questions below that. We will be modeling whether or not respondents purchased a brand of cereal.
Use RStudio to add the data in grey then questions below that.
We will be modeling whether or not respondents purchased a brand of cereal. The variables are Bought (0=no, 1=yes), Income (in tens of thousands of dollars), Children (under 18 in the home, Yes or No), ViewAd (did the respondent see the ad for the brand, Yes or No), and Age (years). The data file is located on the Module 9: Class Session Agenda and Prep page (CerealPurchase.xlsx)
Within the code below are comments in all capital letters that are the questions you are to answer This is the .R Code below:
#logistic Regression assignment #Install packages if needed install.packages("readxl") install.packages("ggplotgui") install.packages("ggplot2") install.packages("shiny") install.packages("lmtest") install.packages("aod") install.packages("pROC") install.packages("psych") #load packages if needed library(readxl) library(ggplotgui) library(ggplot2) library(shiny) library(lmtest) library(aod) library(pROC) install.packages("psych") #get data CerealPurchase <- read_excel(file.choose()) names(CerealPurchase) #Binary Log--predict Bought from all the other variables LogMod <- glm(Bought~Income+Children+ViewAd+Age, data=CerealPurchase, family="binomial") summary(LogMod) #test the whole model lrtest(LogMod) #WHAT DO THE RESULTS TELL YOU? #now run a mixed step-wise analysis to develop a more parsimonious model #mixed step-wise regression #define intercept-only model intercept_only <- glm(Bought~1, data=CerealPurchase, family="binomial") #define model with all predictors all <- glm(Bought~Income + Children + ViewAd + Age, data=CerealPurchase, family="binomial") summary(all) #use them to perform a mixed step-wise regression both <- step(intercept_only, direction='both', scope=formula(all), trace=0) #view results of mixed stepwise regression both$anova summary(both) #WHAT IVs ARE SIGNIFICANT FROM THE MIXED STEP-WISE MODEL? #HOW DO YOU EXPLAIN ANY DIFFERENCES BETWEEN THE "ALL-IN" AND MIXED STEP-WISE MODELS? ##Odds of buying cereal---- #logits coef(both) #Unit Odds Ratios of buying the cereal compared to not buying exp(coef(both)) #WHAT DO THE UNIT ODDS RATIOS MEAN? #WRITE THE CODE TO CONVERT THOSE UNIT ODDS RATIOS TO PROBABILITIES #complete the next line of code to estimate for a respondent who is 33 years old, no children, and saw the ad. Remember that character values need to be enclosed in quotation marks, but that numbers are not. I1 <- data.frame(Children = "", ViewAd = "", Age = ) #estimate #fill-in between the parentheses to have R calculate the prediction for this person (you can see what needs to go here by looking at the code from the recorded class session, or from the modules on simple and multiple regression g <- predict( ) #CONVERT THE ESTIMATE TO ODDS #CONVERT THE ESTIMATE TO PROBABILITY #DO YOU PREDICT THAT THIS PERSON WOULD BUY THE CEREAL? YES/NO #how well does the model perform? #Estimated Probability of prob sleeping CerealPurchase$Buy <- predict(both, type="response") CerealPurchase$PredBuy <- 0 CerealPurchase$PredBuy[CerealPurchase$Buy >= .5] <- 1 #confusion matrix xtab <- table(CerealPurchase$Bought,CerealPurchase$PredBuy) xtab # percent improvement #number wrong with no model wrongNM <- sum(xtab)-max(rowSums(xtab)) #number wrong with model wrongWM <- sum(xtab)-sum(diag(xtab)) #percent improvement in error rate (wrongNM-wrongWM)/wrongNM #WHAT IS THE PERCENT IMPROVEMENT FOR THIS MODEL? IS THAT GOOD OR BAD? #kappa cereal_kappa <- cohen.kappa(xtab) cereal_kappa$kappa #IS THIS KAPPA GOOD OR BAD? #range odds min(CerealPurchase$Age) max(CerealPurchase$Age) IMinAge <- data.frame(Age=29, Children = "No", ViewAd = "No") IMaxAge <- data.frame(Age=53, Children = "No", ViewAd = "No") OddsMinAge <- exp(predict(both, IMinAge)) OddsMinAge OddsMaxAge <- exp(predict(both, IMaxAge)) OddsMaxAge RA_Age <- OddsMaxAge/OddsMinAge #the range odds ratio for Age IMinChild <- data.frame(Age=37, Children = "No", ViewAd = "No") IMaxChild <- data.frame(Age=37, Children = "Yes", ViewAd = "No") OddsMinChild <- exp(predict(both, IMinChild)) OddsMinChild OddsMaxChild <- exp(predict(both, IMaxChild)) OddsMaxChild RA_Child <- OddsMaxChild/OddsMinChild #the range odds ratio for Children IMinAd <- data.frame(Age=37, Children = "No", ViewAd = "No") IMaxAd <- data.frame(Age=37, Children = "No", ViewAd = "Yes") OddsMinAd <- exp(predict(both, IMinAd)) OddsMinAd OddsMaxAd <- exp(predict(both, IMaxAd)) OddsMaxAd RA_Ad <- OddsMaxAd/OddsMinAd #the range odds ratio for ViewAd RA <- cbind(RA_Age, RA_Child, RA_Ad) coef(both) RA #WHAT DOES THE RANGE ODDS REPRESENT FOR EACH VARIABLE?
This is the Data File to use:
Bought | Income | Children | View Ad | Age |
0 | 37 | No | No | 42 |
0 | 47 | Yes | No | 43 |
0 | 49 | No | No | 49 |
0 | 13 | No | Yes | 39 |
0 | 51 | Yes | No | 49 |
0 | 38 | Yes | No | 41 |
0 | 60 | Yes | Yes | 44 |
0 | 17 | Yes | No | 44 |
0 | 60 | No | No | 36 |
0 | 38 | Yes | Yes | 44 |
0 | 24 | Yes | No | 47 |
0 | 15 | Yes | No | 44 |
0 | 28 | Yes | No | 43 |
0 | 36 | Yes | No | 42 |
0 | 10 | No | Yes | 39 |
0 | 46 | No | No | 38 |
0 | 37 | Yes | Yes | 47 |
0 | 55 | Yes | No | 40 |
0 | 24 | Yes | No | 37 |
0 | 19 | Yes | No | 45 |
0 | 55 | Yes | Yes | 46 |
0 | 53 | Yes | Yes | 42 |
0 | 30 | No | Yes | 37 |
0 | 53 | Yes | No | 53 |
0 | 60 | Yes | Yes | 42 |
0 | 53 | Yes | Yes | 37 |
0 | 43 | Yes | No | 40 |
0 | 35 | Yes | Yes | 36 |
0 | 33 | No | Yes | 40 |
0 | 30 | Yes | No | 46 |
0 | 34 | Yes | No | 50 |
0 | 43 | No | No | 41 |
0 | 36 | Yes | No | 46 |
0 | 60 | Yes | Yes | 47 |
0 | 56 | Yes | No | 44 |
0 | 60 | Yes | Yes | 47 |
0 | 33 | Yes | No | 45 |
0 | 20 | No | No | 41 |
0 | 15 | No | Yes | 47 |
0 | 10 | No | No | 37 |
0 | 25 | No | No | 46 |
0 | 15 | Yes | Yes | 40 |
0 | 10 | No | No | 42 |
0 | 15 | No | Yes | 42 |
0 | 24 | Yes | No | 51 |
0 | 15 | No | Yes | 43 |
0 | 24 | Yes | Yes | 45 |
0 | 21 | No | No | 47 |
0 | 21 | No | No | 39 |
1 | 47 | Yes | Yes | 36 |
1 | 59 | Yes | Yes | 43 |
1 | 48 | Yes | Yes | 35 |
1 | 59 | Yes | No | 40 |
1 | 55 | Yes | Yes | 41 |
1 | 13 | Yes | Yes | 46 |
1 | 34 | Yes | No | 34 |
1 | 57 | Yes | Yes | 43 |
1 | 51 | Yes | Yes | 42 |
1 | 23 | Yes | Yes | 37 |
1 | 44 | Yes | Yes | 34 |
1 | 11 | Yes | No | 35 |
1 | 22 | No | No | 45 |
1 | 45 | Yes | Yes | 30 |
1 | 55 | Yes | No | 35 |
1 | 60 | Yes | Yes | 41 |
1 | 44 | Yes | Yes | 43 |
1 | 38 | Yes | Yes | 39 |
1 | 42 | Yes | Yes | 46 |
1 | 33 | Yes | No | 43 |
1 | 52 | No | No | 29 |
1 | 30 | Yes | Yes | 39 |
These are the 8 questions for the Assignment below:
Question 1
What do the results tell you?
In the "all-in" model (with all the ivs entered together), is the model overall significant?
Which ivs are significant?
Question 2
#what ivs are significant from the mixed step-wise model?
#how do you explain any differences between the "all-in" and mixed step-wise models
Question 3
#what do the unit odds ratios mean? In other words, how do you interpret them?
Question 4
#write the code to convert those unit odds ratios to probabilities
Question 5
#convert the estimate to odds
#convert the estimate to probability
#do you predict that this person would buy the cereal? Yes/no
What are this person's odds?
What is their probability of buying the cereal?
Do you predict they will buy the cereal?
Question 6
#what is the percent improvement for this model?
Is that good or bad?
Question 7
#is this kappa good or bad?
Question 8
#what does the range odds represent for each variable?
In other words, what the range odd ratios mean?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started