Question
We considered experiment data to evaluate the effect of various factors on individual's perceived health. We fit different regression models, using the following variables: Health
We considered experiment data to evaluate the effect of various factors on individual's perceived health. We fit different regression models, using the following variables:
Health status (1,2,3,4,5), age, exercise (if they exercise any=1, otherwise=0), smoke100 (did they smoke more than 100=1, otherwise=0) and gender (m=male, f=female),
> fit<- glm(Healthstatus ~ smoke100 + gender + age , family=poisson (link=log), data=cdc)
> summary(fit)
Call:
glm(formula = Healthstatus~smoke100 + gender + age, family=poisson(link=log), data = cdc)
Deviance Residuals:
Min1QMedian3QMax
-1.86501-0.371910.060750.403441.13194
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)1.47110430.0112173 131.146< 2e-16 ***
smoke100-0.06191990.0075249-8.229< 2e-16 ***
genderm0.01997520.00745752.6790.00739 **
age-0.00356090.0002203 -16.162< 2e-16 ***
---
Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 6648.1on 19999degrees of freedom
Residual deviance: 6273.7on 19996degrees of freedom
AIC: 68870
Using above R-output, answer the following questions:
a. In one sentence explain why we choose to use Poisson regression instead of logistic regression. [2Marks]
b. State the prediction equation and interpret the coefficient for smoke100). [2Marks]
c. Find 99% Wald confidence interval for (gender). [2 Marks]
d. Using this model with three main effects, predict the health status for a female, at age of 52 who smoked more than 100 cigarettes and interpret. [3 Marks]
e. Your project manager does not wish to use Wald tests to decide which variable to remove, and prefers to use difference in deviance (likelihood ratio tests). She prepares the following summary after fitting all three two-variable models in addition to your three-variable model:
Model
Deviance
difference compared to full model
P-value
Full model
6273.7
smok100 + age
6380.9
17.2
0.102
smoke100 + gender
6537.6
63.9
0.012
age + gender
6341.6
8.9
0.260
Simply by considering the description of the 3 variables, explain if you suspect multicollinearity to be an issue for this problem? [2Marks]
By examining the difference in deviance between each of the subset models and the full model, explain which two-variable model(s) you will choose for consideration and which two-variable model(s) you will discard. [2Marks]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started