Question

1 Approved Answer

Posted on May 17, 2024

Use RStudio to add the data in grey then questions below that. We will be modeling whether or not respondents purchased a brand of cereal.

Use RStudio to add the data in grey then questions below that.

We will be modeling whether or not respondents purchased a brand of cereal. The variables are Bought (0=no, 1=yes), Income (in tens of thousands of dollars), Children (under 18 in the home, Yes or No), ViewAd (did the respondent see the ad for the brand, Yes or No), and Age (years). The data file is located on the Module 9: Class Session Agenda and Prep page (CerealPurchase.xlsx)

Within the code below are comments in all capital letters that are the questions you are to answer This is the .R Code below:

 #logistic Regression assignment #Install packages if needed install.packages("readxl") install.packages("ggplotgui") install.packages("ggplot2") install.packages("shiny") install.packages("lmtest") install.packages("aod") install.packages("pROC") install.packages("psych") #load packages if needed library(readxl) library(ggplotgui) library(ggplot2) library(shiny) library(lmtest) library(aod) library(pROC) install.packages("psych") #get data CerealPurchase <- read_excel(file.choose()) names(CerealPurchase) #Binary Log--predict Bought from all the other variables LogMod <- glm(Bought~Income+Children+ViewAd+Age, data=CerealPurchase, family="binomial") summary(LogMod) #test the whole model lrtest(LogMod) #WHAT DO THE RESULTS TELL YOU? #now run a mixed step-wise analysis to develop a more parsimonious model #mixed step-wise regression #define intercept-only model intercept_only <- glm(Bought~1,  data=CerealPurchase, family="binomial") #define model with all predictors all <- glm(Bought~Income + Children + ViewAd + Age,  data=CerealPurchase, family="binomial") summary(all) #use them to perform a mixed step-wise regression both <- step(intercept_only, direction='both', scope=formula(all), trace=0) #view results of mixed stepwise regression both$anova summary(both) #WHAT IVs ARE SIGNIFICANT FROM THE MIXED STEP-WISE MODEL? #HOW DO YOU EXPLAIN ANY DIFFERENCES BETWEEN THE "ALL-IN" AND MIXED STEP-WISE MODELS? ##Odds of buying cereal---- #logits coef(both) #Unit Odds Ratios of buying the cereal compared to not buying exp(coef(both)) #WHAT DO THE UNIT ODDS RATIOS MEAN? #WRITE THE CODE TO CONVERT THOSE UNIT ODDS RATIOS TO PROBABILITIES #complete the next line of code to estimate for a respondent who is 33 years old, no children, and saw the ad. Remember that character values need to be enclosed in quotation marks, but that numbers are not. I1 <- data.frame(Children = "", ViewAd = "", Age = ) #estimate #fill-in between the parentheses to have R calculate the prediction for this person (you can see what needs to go here by looking at the code from the recorded class session, or from the modules on simple and multiple regression g <- predict( ) #CONVERT THE ESTIMATE TO ODDS #CONVERT THE ESTIMATE TO PROBABILITY #DO YOU PREDICT THAT THIS PERSON WOULD BUY THE CEREAL? YES/NO #how well does the model perform? #Estimated Probability of prob sleeping CerealPurchase$Buy <- predict(both, type="response") CerealPurchase$PredBuy <- 0 CerealPurchase$PredBuy[CerealPurchase$Buy >= .5] <- 1 #confusion matrix xtab <- table(CerealPurchase$Bought,CerealPurchase$PredBuy) xtab # percent improvement #number wrong with no model wrongNM <- sum(xtab)-max(rowSums(xtab)) #number wrong with model wrongWM <- sum(xtab)-sum(diag(xtab)) #percent improvement in error rate (wrongNM-wrongWM)/wrongNM #WHAT IS THE PERCENT IMPROVEMENT FOR THIS MODEL? IS THAT GOOD OR BAD? #kappa cereal_kappa <- cohen.kappa(xtab) cereal_kappa$kappa #IS THIS KAPPA GOOD OR BAD? #range odds min(CerealPurchase$Age) max(CerealPurchase$Age) IMinAge <- data.frame(Age=29, Children = "No", ViewAd = "No") IMaxAge <- data.frame(Age=53, Children = "No", ViewAd = "No") OddsMinAge <- exp(predict(both, IMinAge)) OddsMinAge OddsMaxAge <- exp(predict(both, IMaxAge)) OddsMaxAge RA_Age <- OddsMaxAge/OddsMinAge #the range odds ratio for Age IMinChild <- data.frame(Age=37, Children = "No", ViewAd = "No") IMaxChild <- data.frame(Age=37, Children = "Yes", ViewAd = "No") OddsMinChild <- exp(predict(both, IMinChild)) OddsMinChild OddsMaxChild <- exp(predict(both, IMaxChild)) OddsMaxChild RA_Child <- OddsMaxChild/OddsMinChild #the range odds ratio for Children IMinAd <- data.frame(Age=37, Children = "No", ViewAd = "No") IMaxAd <- data.frame(Age=37, Children = "No", ViewAd = "Yes") OddsMinAd <- exp(predict(both, IMinAd)) OddsMinAd OddsMaxAd <- exp(predict(both, IMaxAd)) OddsMaxAd RA_Ad <- OddsMaxAd/OddsMinAd #the range odds ratio for ViewAd RA <- cbind(RA_Age, RA_Child, RA_Ad) coef(both) RA #WHAT DOES THE RANGE ODDS REPRESENT FOR EACH VARIABLE?

This is the Data File to use:

Bought	Income	Children	View Ad	Age
0	37	No	No	42
0	47	Yes	No	43
0	49	No	No	49
0	13	No	Yes	39
0	51	Yes	No	49
0	38	Yes	No	41
0	60	Yes	Yes	44
0	17	Yes	No	44
0	60	No	No	36
0	38	Yes	Yes	44
0	24	Yes	No	47
0	15	Yes	No	44
0	28	Yes	No	43
0	36	Yes	No	42
0	10	No	Yes	39
0	46	No	No	38
0	37	Yes	Yes	47
0	55	Yes	No	40
0	24	Yes	No	37
0	19	Yes	No	45
0	55	Yes	Yes	46
0	53	Yes	Yes	42
0	30	No	Yes	37
0	53	Yes	No	53
0	60	Yes	Yes	42
0	53	Yes	Yes	37
0	43	Yes	No	40
0	35	Yes	Yes	36
0	33	No	Yes	40
0	30	Yes	No	46
0	34	Yes	No	50
0	43	No	No	41
0	36	Yes	No	46
0	60	Yes	Yes	47
0	56	Yes	No	44
0	60	Yes	Yes	47
0	33	Yes	No	45
0	20	No	No	41
0	15	No	Yes	47
0	10	No	No	37
0	25	No	No	46
0	15	Yes	Yes	40
0	10	No	No	42
0	15	No	Yes	42
0	24	Yes	No	51
0	15	No	Yes	43
0	24	Yes	Yes	45
0	21	No	No	47
0	21	No	No	39
1	47	Yes	Yes	36
1	59	Yes	Yes	43
1	48	Yes	Yes	35
1	59	Yes	No	40
1	55	Yes	Yes	41
1	13	Yes	Yes	46
1	34	Yes	No	34
1	57	Yes	Yes	43
1	51	Yes	Yes	42
1	23	Yes	Yes	37
1	44	Yes	Yes	34
1	11	Yes	No	35
1	22	No	No	45
1	45	Yes	Yes	30
1	55	Yes	No	35
1	60	Yes	Yes	41
1	44	Yes	Yes	43
1	38	Yes	Yes	39
1	42	Yes	Yes	46
1	33	Yes	No	43
1	52	No	No	29
1	30	Yes	Yes	39