Question:
The following exercise is from Introduction to Regression Modeling and refers to data taken from Higgins and Kochs, Variable Selection and Generalized Chi- Square Analysis of Cat-egorical Data Applied to a Large Cross- Sectional Occupational Health Survey [ International Sta-tistical Review (1977) 45: 51 62]. The data were taken from a large survey of workers in the cotton industry. The researchers wanted to study the factors that may be associated with brown lung disease resulting from inhaling particles of cotton, flax, hemp, or jute. The variables are as follows: number of workers suffering from disease (yes); number of workers not suffering from disease (no); dustiness of workplace (1 high; 2 medium; 3 low); race (1 white; 2 other); sex (1 male; 2 female); smoking history (1 smoker; 2 nonsmoker); length of employment in cotton industry (1 less than 10 years; 2 between 10 and 20 years; 3 more than 20 years).
a. List the five covariates from most likely to least likely to be associated with the probability that a cotton worker has brown lung disease.
b. Do there appear to be any interactions between the covariates?
c. Use a statistical software package to obtain a prediction model using all five covariates.
Transcribed Image Text:
Yes No Dust ceex SkEmploy N Duace Sex SmokingEmplo 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 2 2 2 1 1 "" 2 2 2 1 1 2 2 . 1 1 1 2 2 2 1 1 1 2 2 2 111222 Nn 8 58 9。70 90 4 4 77 ;燃31-45-91 76 1 2 47 ” 180 15 23 2 197 023 Yes 2 1 0 1 0 o o t 0 0 0 3 1 2 1 0 0 0 3 3 0 0 0 5 0 3 3 0 0 0 3 2 0 Sea 1 1 1 1 1 2 2匐2221-11-1. 2 2 2 2 2 1 1 ”11222222 uc 1 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 as I 2 J ㅣ r 3 1 2 3 1 2 3 1 2 3 1 3 1 2 1 2 3 1 2 3 1 2 3 1 2 1 2 3 №| 37 2 1 88 5 93-끄 45 360 16 35 194 75 47-4 54 199 24 42期21 ” 87 30 5 33 33 。43 0 30,~ 5 O 3 0 1 3 2 2 3 0 0 0 6 1 0 1 2 1 3 4 8 1 1 8 0 0 0 1 2 0 0 a