Question 1:
9.7 Mittlbock and Heinzl (2001) compare Poisson and logistic regression models for data in which the event rate is small so that the Poisson dis- tribution provides a reasonable approximation to the Binomial distribu- tion. An example is the number of deaths from coronary heart disease among British doctors (Table 9.1). In Section 9.2.1 we fitted the model Yi ~ Po(deathsi) with Equation (9.9) log(deaths;) = log(personyears; ) + B1 + Bysmoke; + Byagecat; + BAagesqi + Bssmkagei. An alternative is Yi ~ Bin( personyearsi, mi) with logit ( ni) = B1 + Bysmokei + Byagecat; + BAagesqi + Basmkagei. Another version is based on a Bernoulli distribution Z; ~ B(i) for each doctor in group i with j = 1, ..., deathsi j = deaths; + 1, ..., personyears; and logit ( Tti) = B1 + Bysmoke; + Byagecat; + BAagesqi + Bssmkagei.a. Fit all three models (in Stata the Bernoulli model cannot be fitted with glm; use blogit instead). Verify that the B estimates are very similar. b. Calculate the statistics D, X2 and pseudo R for all three models. No- tice that the pseudo R is much smaller for the Bernoulli model. As Mittlbock and Heinzl (2001) point out this is because the Poisson and Binomial models are estimating the probability of death for each group (which is relatively easy) whereas the Bernoulli model is estimating the probability of death for an individual (which is much more difficult). C. Fit the Poisson model and verify your answers. d. For all three models, carry out the Wald test for the smoke effect (B2 = B; = 0) .Table 9.1 Deaths from coronary heart disease after 10 years among British male doctors categorized by age and smoking status in 1951. Age Smokers Non-smokers group Deaths Person-years Deaths Person-years 35-44 32 52407 2 18790 45-54 104 43248 12 10673 55-64 206 28612 28 5710 65-74 186 12663 28 2585 75-84 102 5317 31 1462 log (deaths;) = log (personyears; ) + B1 + Bysmoke; + Byagecati + BAagesqi + Bssmkage (9.9)