Question
Q SCI 381: Introduction to Probability and Statistics Winter 2022 Laboratory #8 (70 points) In today's lab, we will be using the file mortality.csv that
Q SCI 381: Introduction to Probability and Statistics
Winter 2022
Laboratory #8 (70 points)
In today's lab, we will be using the file mortality.csv that is available in Canvas (located in Files\Lab Datasets). The data contained in this file are from a study in which researchers were studying factors associated with mortality rates in 60 urban areas in the United States; we will be using a subset of the original dataset.
The data below is mortality.csv:
Elderly | Poverty | NO | SO2 | Mort.Rate |
8.1 | 11.7 | 15 | 59 | 921.87 |
11.1 | 14.4 | 10 | 39 | 997.875 |
10.4 | 12.4 | 6 | 33 | 962.354 |
6.5 | 20.6 | 8 | 24 | 982.291 |
7.6 | 14.3 | 38 | 206 | 1071.289 |
7.7 | 25.5 | 32 | 72 | 1030.38 |
10.9 | 11.3 | 32 | 62 | 934.7 |
9.3 | 10.5 | 4 | 4 | 899.529 |
9 | 12.6 | 12 | 37 | 1001.902 |
9.5 | 13.2 | 7 | 20 | 912.347 |
7.7 | 24.2 | 8 | 27 | 1017.613 |
8.6 | 10.7 | 63 | 278 | 1024.885 |
9.2 | 15.1 | 26 | 146 | 970.467 |
8.8 | 11.4 | 21 | 64 | 985.95 |
8 | 13.9 | 9 | 15 | 958.839 |
7.1 | 16.1 | 1 | 1 | 860.101 |
7.5 | 12 | 4 | 16 | 936.234 |
8.2 | 12.7 | 8 | 28 | 871.766 |
7.2 | 13.6 | 35 | 124 | 959.221 |
6.5 | 12.4 | 4 | 11 | 941.181 |
7.3 | 18.5 | 1 | 1 | 891.708 |
9 | 12.3 | 3 | 10 | 871.338 |
6.1 | 19.5 | 3 | 5 | 971.122 |
9 | 9.5 | 3 | 10 | 887.466 |
5.6 | 17.9 | 5 | 1 | 952.529 |
8.7 | 13.2 | 7 | 33 | 968.665 |
9.2 | 13.9 | 4 | 4 | 919.729 |
10.1 | 12 | 7 | 32 | 844.053 |
9.2 | 12.3 | 319 | 130 | 861.833 |
8.3 | 17.7 | 37 | 193 | 989.265 |
7.3 | 26.4 | 10 | 34 | 1006.49 |
10 | 22.4 | 1 | 1 | 861.439 |
8.8 | 9.4 | 23 | 125 | 929.15 |
9.2 | 9.8 | 11 | 26 | 857.622 |
8.3 | 24.1 | 14 | 78 | 961.009 |
10.2 | 12.2 | 3 | 8 | 923.234 |
7.4 | 24.2 | 17 | 1 | 1113.156 |
9.7 | 12.4 | 26 | 108 | 994.648 |
9.1 | 13.2 | 32 | 161 | 1015.023 |
9.5 | 13.8 | 59 | 263 | 991.29 |
11.3 | 13.5 | 21 | 44 | 893.991 |
10.7 | 15.7 | 4 | 18 | 938.5 |
11.2 | 14.1 | 11 | 89 | 946.185 |
8.2 | 17.5 | 9 | 48 | 1025.502 |
10.9 | 10.8 | 4 | 18 | 874.281 |
9.3 | 15.3 | 15 | 68 | 953.56 |
7.3 | 14 | 66 | 20 | 839.709 |
9.2 | 12 | 171 | 86 | 911.701 |
7 | 9.7 | 32 | 3 | 790.733 |
9.6 | 10.1 | 7 | 20 | 899.264 |
10.6 | 12.3 | 4 | 20 | 904.155 |
9.8 | 11.1 | 5 | 25 | 950.672 |
9.3 | 13.6 | 7 | 25 | 972.464 |
11.3 | 13.5 | 2 | 11 | 912.202 |
6.2 | 10.3 | 28 | 102 | 967.803 |
7 | 13.2 | 2 | 1 | 823.764 |
7.7 | 10.9 | 11 | 42 | 1003.502 |
11.8 | 14 | 3 | 8 | 895.696 |
9.7 | 14.5 | 8 | 49 | 911.817 |
8.9 | 13 | 13 | 39 | 954.442 |
The file contains five columns of data collected from each of the 60 urban areas. The columns include:
Elderly: % population aged 65 or older
Poverty: % of families with an income below the poverty level
NO: a measure of the levels of nitric oxides
SO2: a measure of the levels of sulphur dioxide
Mort.Rate: the mortality rate per 100,000 residents
We will be using multiple regression to measure the effect of the predictor variables elderly, poverty, NO, and SO2 on the response variable, Mort.Rate.
(1) Before conducting any regression analyses, let's explore the dataset by a plotting pair-wise scatterplot using the plot() command (recall your code from Lab 7). Add a color of your choice and paste your plot below. (4 points)
(2) In your pair-wise scatterplot, which, if any, predictor variables appear to be correlated with Mort.Rate? Which, if any, predictor variables do not appear to be correlated with Mort.Rate? (4 points)
(3) Multiple linear regression assumes that the response variable is normality distributed. Plot a histogram of the response variable, Mort.Rate. Include a title and a color of your choice, and paste your histogram below. (8 points)
(4) What is the mean and median of Mort.Rate? Based on a visual assessment of your histogram in (3), and your estimates of the mean and median, do you conclude that Mort.Rate is normally distributed? Why or why not? (6 points)
(5) Regardless of your conclusion in (4), let's assume Mort.Rate is normally distributed, and let's use multiple regression to determine which, if any, of the predictor variables can be used to statistically predict Mort.Rate. First, run a multiple regression to determine if Elderly and Poverty predict Mort.Rate. Paste your code and output below. (4 points)
(6) Test the following null hypotheses using alpha=0.05, and indicate your statistical conclusion and interpretation for each. (8 points)
Ho: slope for Elderly = 0
Ho: slope for Poverty = 0
(7) Based on your conclusions in (6), run another multiple regression analysis by including any significant predictors from (6) and the predictor variable NO. Paste your code and output below. (4 points)
(8) Based on your output in (7), which, if any, predictor variables significantly predict Mort.Rate when using alpha=0.05. (i.e., test the null hypotheses, Ho: slope for predictor variable = 0). (8 points)
(9) Based on your conclusions in (8), run another multiple regression analysis by including any significant predictors from (8) and the predictor variable SO2. Paste your code and output below. (4 points)
(10) Based on your output from (9), which, if any, predictor variables significantly predict Mort.Rate when using alpha=0.05. (i.e., test the null hypotheses, Ho: slope for predictor variable = 0). (8 points)
(11) Examine the output from (9) and your conclusion from (10). What is the final regression equation? (8 points)
(12) Using your output from your final multiple regression model in (9), how much of the variation in Mort.Rate is explained by the significant predictor variables in this final regression model? (4 points)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started