Question

1 Approved Answer

Posted on Jun 11, 2024

This is a statistical lab report . I am attaching 3 documents . One contains the questions . The other 2 contain the data to

This is a statistical lab report . I am attaching 3 documents . One contains the questions . The other 2 contain the data to be used . I need a hand written or digital solution to the report .

Statistics 2120: Introduction to Statistical Analysis Laboratory Assignment 12 Submit handwritten solutions to the problems below on your own paper by the due date and time given by the Graduate Teaching Assistant. The solutions you submit should be in the same order in which the problems are stated with the answer to each question clearly indicated. Messy or indecipherable answers will not be graded and will receive zero credit. You are permitted and encouraged to discuss the the problems with other students, but you must write the solution that you turn in on your own. Copying other students work is a violation of the Honor Pledge and will be treated as such. Problem 1: Pollution of water resources is a serious problem that can require substantial eorts and funds to rectify. In order to determine the nancial resources required, an accurate assessment of the water quality, which is measured by the index of biotic integrity (IBI), is needed. Since IBI is very expensive to measure, a study was done for a collection of streams in the Ozark Highland ecoregion of Arkansas in which the IBI for each stream was measured along with land use measures that are inexpensive to obtain. The land use measures collected in the study are the area of the watershed in square kilometers and the percent of the watershed area that is forest. The objective of this study is to determine if these land use measures can predict the IBI so that the funding required for pollution clean up can be accurately estimated. The data collected from the n = 49 watersheds are provided in the le Lab12 Data01.xlsx. Part A: The third tab of the le Lab12 Data01.xlsx contains the scatterplot of IBI vs. area and the scatterplot of IBI vs. forest. Describe the relationship between each of the explanatory variables and the response variable. For each explanatory variable, do you think that it will be useful in predicting the IBI? Part B: The third tab of the le Lab12 Data01.xlsx also contains the scatterplot of area vs. forest. Describe the relationship between the two response variables. Part C: The second tab of the le Lab12 Data01.xlsx contains the parameter estimates for the statistical model yi = 0 + 1 xi1 + 2 xi2 + i . For each of the n = 49 data points, use Excel to calculate the predicted IBI, y , and the corresponding residual, ei . Write out the predicted IBI and residual for the rst three data points. Part D: The fourth tab of the le Lab12 Data01.xlsx contains the scatterplot of the residuals vs. the predicted IBI, the scatterplot of the residuals vs. area, and the scatterplot of the residuals vs. forest. Do the regression assumptions of linearity and constant variance seem to hold? Explain. Part E: The fourth tab of the le Lab12 Data01.xlsx also contains the histogram of the residuals. Does the normality assumption hold? Explain. Part F: Researchers would like to estimate the IBI for a stream that was not a part of this study. The area of the watershed is 55 square kilometers and 84% of that watershed area is forest. Calculate a 99% prediction interval for the IBI for this stream. The standard error at these particular values is SEy = 15.9. Part G: Ecologists have long lobbied for protection for watershed forests because they believe that it increases the water quality. Before their next discussion with policy makers, they would like to have a estimate for the true value to water quality resulting from increasing the percentage of watershed forest by even 1% and they ask you for an estimate with 95% condence. What values should you give them? 1 Problem 2: The mathematics department at a university is interested in learning more about the grades that students earn in a introductory calculus class. A sample of n = 80 students who have taken this introductory calculus course during any semester of the three last three academic years is selected. The academic record for each selected student is reviewed. The professors in the math department determine that, of the available information for all students, the relevant variables are their nal grade in the calculus course, their score on the algebra placement test given to all incoming students who plan to take a calculus course, their ACT Math score, their ACT Natural Sciences score, and their high school percentile rank. These data are provided in the le Lab12 Data02.xlsx. Part A: The professors in the math department believe that the score on the algebra placement test and the high school percentile rank are the best predictor variables for the nal grade in calculus. The second tab of the le Lab12 Data02.xlsx contains the output for the regression using these two variables as the explanatory variables. Conduct the F test for whether this model is useful or not. State the hypotheses, test statistic, p-value, and your conclusion in context. Part B: One of the professors suggests testing whether the two ACT scores improve the model. The third tab of the le Lab12 Data02.xlsx contains the output for the regression using all four of the relevant explanatory variables. Before conducting the appropriate F test, identify and compare the adjusted R2 values for each t model. What do you think the outcome of that F test will be? Why? Part C: The professor who suggested this test is adamant that it be done. Conduct the appropriate F test to determine if the ACT scores improve the model. State the hypotheses, test statistic, p-value, and your conclusion in context. Which model of the two suggested should the math department use? Part D: In an attempt to further improve the model, the professors decide to conduct a signicance test for each of the variables in the model you chose in Part C. For each test, state the hypotheses, test statistic, p-value, and your conclusion in context. Which explanatory variable, if any, would you try removing rst? Part E: Calculate the predicted nal grade in calculus and the residual for each selected student using the model that would result from your decision in Part D. What is the value of the residual for the rst three selected students? Part F: Use Excel to create a scatterplot of the residuals vs. the predicted nal grade as well as scatterplot(s) of the residuals vs. each explanatory variable in the model that would result from your decision in Part D. Roughly sketch the resulting scatterplots. Do the regression assumptions of linearity and constant variance seem to hold? Explain. Part G: Create a histogram of the residuals and roughly sketch it. Does the normality assumption hold? Explain. 2 Stream 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 IBI Area 47 61 39 59 72 76 85 89 74 89 33 46 32 80 80 78 53 43 88 84 62 55 29 29 54 78 71 55 58 71 33 59 81 71 75 64 41 82 60 84 83 82 82 86 79 67 56 85 91 Forest 21 29 31 32 34 34 49 52 2 70 6 28 21 59 69 47 8 8 58 54 10 57 18 19 39 49 9 5 14 9 23 31 18 16 21 32 10 26 9 54 12 21 27 23 26 16 26 26 28 0 0 0 0 0 0 3 3 7 8 9 10 10 11 14 17 17 18 21 22 25 31 32 33 33 33 39 41 43 43 47 49 49 52 52 59 63 68 75 79 79 80 86 89 90 95 95 100 100 SUMMARY OUTPUT Intercept Area Forest Coefficients 40.6292447495 0.5693954553 0.2336709359 Standard Error 5.4614423015 0.1262374345 0.0694376339 IBI vs. Area 100 90 80 70 60 50 IBI 40 30 20 10 0 0 10 20 30 40 IBI vs. Forest 50 60 70 100 90 80 70 60 50 IBI 40 30 20 10 0 80 0 20 40 60 Area Forest Forest vs. Area 120 100 80 Forest 60 40 20 0 0 10 20 30 40 Area 50 60 70 80 80 100 120 Residual plot 40 30 20 10 Residuals 0 -10 40 50 60 70 80 90 100 -20 -30 -40 Predicted Residual plot Residual plot 40 30 30 20 20 10 Residuals 40 10 0 -10 0 Residuals 10 20 30 40 50 60 70 80 0 -10 0 -20 60 -30 -40 40 -20 -30 20 -40 Forest Area Residual histogram 16 14 12 10 8 6 4 2 0 -30 -20 -10 0 Re siduals 10 20 30 40 80 100 120 Observation Grade HS Rank Algebra 1 62 68 21 2 75 99 16 3 95 98 22 4 78 90 25 5 95 99 22 6 91 97 19 7 72 79 23 8 95 95 15 9 88 85 14 10 97 82 19 11 49 81 12 12 70 87 16 13 75 92 13 14 78 89 19 15 89 97 25 16 87 81 10 17 79 91 14 18 85 97 18 19 57 46 13 20 81 80 15 21 76 80 18 22 88 89 17 23 83 94 21 24 97 71 24 25 60 97 12 26 84 85 18 27 87 50 18 28 95 90 29 29 64 94 15 30 80 81 16 31 93 99 24 32 91 96 21 33 96 90 30 34 85 96 28 35 94 91 19 36 70 99 23 37 80 97 22 38 60 83 18 39 65 84 15 40 82 99 24 41 65 98 14 42 65 86 15 43 84 86 12 44 70 97 15 45 65 96 20 46 78 99 27 47 70 76 22 48 82 79 19 49 89 92 21 50 72 97 18 51 90 96 23 52 80 60 12 53 99 92 22 54 83 68 18 55 89 98 23 56 75 89 23 57 92 92 23 58 70 65 16 59 95 95 23 60 70 85 21 61 60 66 12 62 92 97 24 63 91 85 21 64 75 95 10 65 60 75 17 66 78 99 23 67 94 95 28 68 67 50 19 69 78 73 26 70 90 97 23 71 78 96 24 72 79 82 14 73 75 99 17 74 75 94 19 75 81 94 15 76 85 88 22 77 75 92 17 78 88 95 26 79 95 99 26 80 85 99 21 ACTM ACTNS 27 23 29 32 30 32 34 28 29 23 30 28 29 31 28 28 28 24 31 32 25 25 34 30 27 23 28 28 31 30 26 27 24 20 30 30 25 24 25 22 27 28 27 26 28 18 27 27 27 32 27 20 26 26 33 31 27 18 26 26 35 30 27 31 32 31 30 28 27 32 30 34 30 25 28 25 28 28 30 31 26 32 31 25 28 24 26 23 23 15 28 20 31 28 28 26 33 32 24 28 32 29 20 19 28 32 34 24 33 32 30 28 29 24 16 17 30 24 30 21 26 30 29 28 26 27 27 26 26 26 25 26 31 26 18 21 29 29 29 28 30 32 29 28 29 32 30 25 33 32 29 26 29 33 27 29 28 30 28 30 SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.524 0.275 0.256 9.864 80 ANOVA df Regression Residual Total Intercept HS Rank Algebra SS 2 77 79 2840.4 7491.8 10332.2 Coefficients Standard Error 43.869 8.375 0.181 0.096 1.050 0.247 MS 1420.2 97.3 t Stat 5.238 1.896 4.248 F 14.6 P-value Lower 95% Upper 95% 0.000 27.193 60.544 0.062 -0.009 0.372 0.000 0.558 1.542 SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.538 0.289 0.251 9.897 80 ANOVA df Regression Residual Total Intercept HS Rank Algebra ACTM ACTNS SS 4 75 79 2986.2 7346.0 10332.2 Coefficients Standard Error 36.122 10.752 0.135 0.104 0.961 0.264 0.272 0.454 0.216 0.313 MS 746.5 97.9 F 7.6 t Stat P-value Lower 95% Upper 95% 3.360 0.001 14.703 57.540 1.306 0.196 -0.071 0.342 3.640 0.000 0.435 1.487 0.599 0.551 -0.632 1.175 0.690 0.492 -0.408 0.840