Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1 STAT 7000 Assignment 3 (Fall 2019) Instructions: The due date: November 15, 2019. Solutions must be typed. Use R to answer the questions. Instead,

1

STAT 7000 Assignment 3 (Fall 2019)

Instructions:

The due date: November 15, 2019.

Solutions must be typed. Use R to answer the questions. Instead, pick the values you need to include in your solution writeup.

Please submit your HW paper as pdf or docx file to CANVAS by designated date.

Data are from McDonald and Schwing (1973), "Instabilities of Regression Estimates Relating Air

Pollution to Mortality," Technometrics, 15, 463-481. This data set consists of 15 independent

variables (see list below) and a measure of mortality on 60 US metropolitan areas in 1959-1961.

Description of Variables

Y Total Age Adjusted Mortality Rate

x1 Mean annual precipitation in inches

x2 Mean January temperature in degrees Fahrenheit

x3 Mean July temperature in degrees Fahrenheit

x4 Percent of 1960 SMSA population that is 65 years of age or over

x5 Pop per household, 1960 SMSA (Standard Metropolitan Statistical Area)

x6 Median school years completed for those over 25 in 1960 SMSA

x7 Percent of housing units that are found with facilities

x8 Population per square mile in urbanized area in 1960

x9 Percent of 1960 urbanized area population that is non-white

x10 Percent employment in white-collar occupations in 1960 urbanized area

x11 Percent of families with income under 3,000 in 1960 urbanized area

x12 Relative population potential of hydrocarbons, HC

x13 Relative pollution potential of oxides of nitrogen, NOx

x14 Relative pollution potential of sulfur dioxide, SO2

x15 Percent relative humidity, annual average at 1 p.m.

Data are given in the file "pollution.txt". We are interested in prediction and describing the

relationship between the mortality rate

2

The data file can be downloaded at the following link.

http://www.auburn.edu/~billone/datasets/stat7000/pollution.txt

1. Create scatter plot matrix and examine all the pairwise relationships. Comment on these.

2. Construct ANOVA table and tell me what the variation measures tell you.

3. Write down the hypotheses for the significance of regression.

4. Is the regression significant? Why?

5. By examining the individual hypothesis testing for the fifteen regression coefficients, which

regression coefficients are statistically significant? List these.

6. Find the correlation matrix and tell me if there is multicollinearity problem.

7. Perform best subset selection method by using R2

adj , Cp, MSE, AIC and BIC criteria ( The

smaller last four criteria are the better model is!).

8. Perform sequential procedures: forward , backward and stepwise variable selection.

9. Find the optimal (s) model(s).

10. Check if there is a collinearity issue in the optimal model(s).

11. Check the model(s) adequacy(ies) of the optimal model(s) (that is, are the assumptions

satisfied?).

12. Write one paragraph summarizing what your findings.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

California Algebra 1 Concepts Skills And Problem Solving

Authors: Berchie Holliday, Gilbert J. Cuevas, Beatrice Luchin, John A. Carter, Daniel Marks

1st Edition

0078778522, 978-0078778520

More Books

Students also viewed these Mathematics questions

Question

Explain the factors influencing consumer behaviour.

Answered: 1 week ago