Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

y x1 10 11 11 13 10 11 10 11 4 2 7 10 9 9 6 5 5 5 6 4 3 3 4

y x1 10 11 11 13 10 11 10 11 4 2 7 10 9 9 6 5 5 5 6 4 3 3 4 10 6 8 2 0 x2 2113 2003 2957 2285 2971 2309 2528 2147 1689 2566 2363 2109 2295 1932 2213 1722 1498 1873 2118 1775 1904 1929 2080 2301 2040 2447 1416 1503 x3 1985 2855 1737 2905 1666 2927 2341 2737 1414 1838 1480 2191 2229 2204 2140 1730 2072 2929 2268 1983 1792 1606 1492 2835 2416 1638 2649 1503 x4 38.9 38.8 40.1 41.6 39.2 39.7 38.1 37 42.1 42.3 37.3 39.5 37.4 35.1 38.8 36.6 35.3 41.1 38.2 39.3 39.7 39.7 35.5 35.3 38.7 39.9 37.4 39.3 x5 64.7 61.3 60 45.3 53.8 74.1 65.4 78.3 47.6 54.2 48 51.9 53.6 71.4 58.3 52.6 59.3 55.3 69.6 78.3 38.1 68.8 68.8 74.1 50 57.1 56.3 47 x6 4 3 14 -4 15 8 12 -1 -3 -1 19 6 -5 3 6 -19 -5 10 6 7 -9 -21 -8 2 0 -8 -22 -9 x7 868 615 914 957 836 786 754 761 714 797 984 700 1037 986 819 791 776 789 582 901 734 627 722 683 576 848 684 875 x8 59.7 55 65.6 61.4 66.1 61 66.1 58 57 58.9 67.5 57.2 58.8 58.6 59.2 54.4 49.6 54.3 58.7 51.7 61.9 52.7 57.8 59.7 54.9 65.3 43.8 53.5 x9 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 1917 1575 2175 2476 1866 2339 2092 1909 2001 2254 2217 1758 2032 2025 1686 1835 1914 2496 2670 2202 1988 2324 2550 2110 2628 1776 2524 2241 Multiple linear regression questions. Use SAS or R, include your codes of each question. Data set: y x1 x2 x3 x4 x5 x6 x7 10 2113 1985 38.9 64.7 4 868 59.7 11 2003 2855 38.8 61.3 3 615 55 11 2957 1737 40.1 60 14 914 65.6 13 2285 2905 41.6 45.3 -4 957 61.4 10 2971 1666 39.2 53.8 15 836 66.1 11 2309 2927 39.7 74.1 8 786 61 10 2528 2341 38.1 65.4 12 754 66.1 11 2147 2737 37 78.3 -1 761 58 4 1689 1414 42.1 47.6 -3 714 57 2 2566 1838 42.3 54.2 -1 797 58.9 7 2363 1480 37.3 48 19 984 67.5 10 2109 2191 39.5 51.9 6 700 57.2 9 2295 2229 37.4 53.6 -5 1037 58.8 9 1932 2204 35.1 71.4 3 986 58.6 6 2213 2140 38.8 58.3 6 819 59.2 5 1722 1730 36.6 52.6 -19 791 54.4 5 1498 2072 35.3 59.3 -5 776 49.6 5 1873 2929 41.1 55.3 10 789 54.3 6 2118 2268 38.2 69.6 6 582 58.7 4 1775 1983 39.3 78.3 7 901 51.7 3 1904 1792 39.7 38.1 -9 734 61.9 3 1929 1606 39.7 68.8 -21 627 52.7 4 2080 1492 35.5 68.8 -8 722 57.8 10 2301 2835 35.3 74.1 2 683 59.7 6 2040 2416 38.7 50 0 576 54.9 8 2447 1638 39.9 57.1 -8 848 65.3 2 1416 2649 37.4 56.3 -22 684 43.8 0 1503 1503 39.3 47 -9 875 53.5 x8 x9 2205 1917 2096 1575 1847 2175 1903 2476 1457 1866 1848 2339 1564 2092 1821 1909 2577 2001 2476 2254 1984 2217 1917 1758 1761 2032 1709 2025 1901 1686 2288 1835 2072 1914 2861 2496 2411 2670 2289 2202 2203 1988 2592 2324 2053 2550 1979 2110 2048 2628 1786 1776 2876 2524 2560 2241 1)Fit the 1st multiple linear regression model to those data using X2, X7,X8. Construct the analysis-of-variance table and test for significance of regression. 2) Calculate t statistics for testing the hypotheses H 0: 2 = 0, H0: 7 = 0, and H0: 8= 0. What conclusions can you draw about the roles the variables X 2, X7, and X8 play in the model? Calculate R2 and R2Adj for this model. 3) Using the partial F test, determine the contribution of X 7 to the model. How is this partial F statistic related to the t test for 7 calculated in part 2) above? 4) Show numerically that the square of the simple correlation coefficient between the observed values yi and the fitted values i equals R2 5) Find a 95% CI on 7 And Find a 95% CI on the mean number of games won by a team when X2 = 2300, X7= 56.0, and X8 = 2100 6) Now fit a 2nd model to these data using only x7 and X 8 as the regressors and Test for significance of regression 7) Calculate R2 and R2Adj. How do these quantities compare to the values computed for the 1st model, which included an additional regressor (X2)? 8) Calculate a 95% CI on 7. Also find a 95% CI on the mean number of games won by a team when X7 = 56.0 and X8 = 2100. Compare the lengths of these CIs to the lengths of the corresponding CIs from part 5) . What conclusions can you draw from this problem about the consequences of omitting an important regressor from a model? Multiple linear regression questions. Use SAS or R, include your codes of each question. Data set: y x x2 x3 x x x x 1 4 5 6 7 8 10 2113 1985 38.9 64.7 4 868 59.7 11 2003 2855 38.8 61.3 3 615 55 11 2957 1737 40.1 60 14 914 65.6 13 2285 2905 41.6 45.3 -4 957 61.4 10 2971 1666 39.2 53.8 15 836 66.1 11 2309 2927 39.7 74.1 8 786 61 10 2528 2341 38.1 65.4 12 754 66.1 11 2147 2737 37 78.3 -1 761 58 4 1689 1414 42.1 47.6 -3 714 57 2 2566 1838 42.3 54.2 -1 797 58.9 7 2363 1480 37.3 48 19 984 67.5 10 2109 2191 39.5 51.9 6 700 57.2 9 2295 2229 37.4 53.6 -5 1037 58.8 9 1932 2204 35.1 71.4 3 986 58.6 6 2213 2140 38.8 58.3 6 819 59.2 5 1722 1730 36.6 52.6 -19 791 54.4 5 1498 2072 35.3 59.3 -5 776 49.6 5 1873 2929 41.1 55.3 10 789 54.3 6 2118 2268 38.2 69.6 6 582 58.7 4 1775 1983 39.3 78.3 7 901 51.7 3 1904 1792 39.7 38.1 -9 734 61.9 3 1929 1606 39.7 68.8 -21 627 52.7 4 2080 1492 35.5 68.8 -8 722 57.8 10 2301 2835 35.3 74.1 2 683 59.7 6 2040 2416 38.7 50 0 576 54.9 8 2447 1638 39.9 57.1 -8 848 65.3 2 1416 2649 37.4 56.3 -22 684 43.8 0 1503 1503 39.3 47 -9 875 53.5 x x 9 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 1917 1575 2175 2476 1866 2339 2092 1909 2001 2254 2217 1758 2032 2025 1686 1835 1914 2496 2670 2202 1988 2324 2550 2110 2628 1776 2524 2241 1) Fit the 1st multiple linear regression model to those data using X 2, X7,X8. Construct the analysis-of-variance table and test for significance of regression. SAS Code: proc reg data = a; model y = x2 x7 x8; output out=results r=residual p=yhat; run; SAS Output: Result: We can see that the p-value of the ANOVA table is smaller than 0.05 so we can conclude that this model is statistically significant at 5% level of significance. Regression equation is Y = -1.8084+0.0036*X 2+0.1940*X7-0.0048*X8 2) Calculate t statistics for testing the hypotheses H 0: 2 = 0, H0: 7 = 0, and H0: 8= 0. What conclusions can you draw about the roles the variables X 2, X7, and X8 play in the model? Calculate R2 and R2Adj for this model. SAS Code: proc reg data = a; model y = x2 x7 x8; output out=results r=residual p=yhat; run; SAS Output: Result: We can see that the p-value of all coefficient of x2, x7 and x8 are smaller than 0.05 level of significance so we can say that the variables X 2, X7, and X8 plays significant role in the model. The R2 value is 0.7863 and R2Adj value is 0.7596. 3) Using the partial F test, determine the contribution of X 7 to the model. How is this partial F statistic related to the t test for 7 calculated in part 2) above? SSE ( R ) SSE ( F) df R df F F* SSE ( F) df F SAS Code: proc reg data = a; model y = x7; output out=results r=residual p=yhat; run; SAS Output: Results: F = ((229.72614-69.87)/(26-24))/( 69.87/24)= 27.4549 Partial F statistic related to the t test for 7 calculated in part 2 is 27.4549 which is very high, it means this coefficient is statistically significant in the model to predict the dependent variable. Note: I am not sure about your answer, but I am sure that above answer could not be wrong. 4) Show numerically that the square of the simple correlation coefficient between the observed values yi and the fitted values i equals R2 SAS Code: proc corr data = results; var yhat y; run; SAS Output: The below are the actual and predicted values: Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Predicted 6.295058 9.038655 8.271048 11.38929 9.990608 11.65572 11.90405 10.52022 1.925503 4.305974 7.055146 7.938222 9.136502 9.258164 8.219689 3.949875 Actual yi 10 11 11 13 10 11 10 11 4 2 7 10 9 9 6 5 17 18 19 20 21 22 23 24 25 26 27 28 5.289552 5.485287 6.12736 4.331678 6.036974 1.710071 4.88464 10.44172 7.670849 8.150396 2.369012 1.648734 5 5 6 4 3 3 4 10 6 8 2 0 Results: Correlation between actual and predicted values is 0.88673949, The square of this value is 0.78630692, which is equal to R2. So we can conclude that the square of the simple correlation coefficient between the observed values yi and the fitted values i equals R2. 5) Find a 95% CI on 7 And Find a 95% CI on the mean number of games won by a team when X2 = 2300, X7= 56.0, and X8 = 2100 SAS Code: proc reg data = a; model y = x2 x7 x8 / alpha = 0.05 CLB; output out=results r=residual p=yhat; run;; SAS Output: Results: 95% confidence interval on 7 is (0.001186, 0.37606). Predicted values for: y x2 2,300 x7 56 x8 2,100 Predicted 7.216 95% Confidence Interval lower upper 6.436 7.997 6) Now fit a 2nd model to these data using only x7 and X 8 as the regressors and Test for significance of regression SAS Code: proc reg data = a; model y = x7 x8; *output out=results r=residual p=yhat; run; SAS Output: Results: We can see that the p-value of the ANOVA table is smaller than 0.05 so we can conclude that this model is statistically significant at 5% level of significance. We can see that the p-value of all coefficient of x8 is smaller than 0.05 level of significance so we can say that the variables X8 plays significant role in the model but variable X 7 does not plays significant role in the model as it is not significant at 5% level. Regression equation is Y = 17.94432+0.04837*X7-0.00654*X8 7) Calculate R2 and R2Adj. How do these quantities compare to the values computed for the 1st model, which included an additional regressor (X2)? Results: The R2 value is 0.5477 and R2Adj value is 0.5115. These are smaller as compare to the first model. We can say that model 1 is better as compare to model 2 with X 7 and X8 as independent variables. 8) Calculate a 95% CI on 7. Also find a 95% CI on the mean number of games won by a team when X7 = 56.0 and X8 = 2100. Compare the lengths of these CIs to the lengths of the corresponding CIs from part 5) . What conclusions can you draw from this problem about the consequences of omitting an important regressor from a model? SAS Code: proc reg data = a; model y = x7 x8 / alpha = 0.05 CLB ; run; SAS Output: Results: 95% confidence interval on 7 is (-0.1972, 0.2939). Predicted values for: y x7 56 x8 2,100 Predicted 6.926 95% Confidence Interval lower upper 5.829 8.024 We can see that the length of these Cis is increased as compare to Cis in part 5. It means the omission of X2 variable should not be done because that is the most important variable in predicting the dependent variable. Confidence Interval Estimate and Prediction Interval Data Confidence Level x7 given value x8 given value X'X Inverse of X'X X'G times Inverse of X'X [X'G times Inverse of X'X] times XG t Statistic Predicted Y (YHat) 95% 1 56 2100 28 1628.4 59084 1628.4 95487.38 3399713 59084 3399713 1.3E+008 16.44183 -0.190922 -0.002513 -0.190922 0.002403 2.43E-005 -0.002513 2.43E-005 5.22E-007 0.473049 -0.005429 -5.8E-005 0.048009 2.059539 6.926243 For Average Predicted Y (YHat) Interval Half Width 1.097599 Confidence Interval Lower Limit 5.828643 Confidence Interval Upper Limit 8.023842 For Individual Response Interval Half Width Prediction Interval Lower Limit Prediction Interval Upper Limit Y 5.128186 1.798056 12.05443 x7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x8 59.7 55 65.6 61.4 66.1 61 66.1 58 57 58.9 67.5 57.2 58.8 58.6 59.2 54.4 49.6 54.3 58.7 51.7 61.9 52.7 57.8 59.7 54.9 65.3 43.8 53.5 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 y x7 10 11 11 13 10 11 10 11 4 2 7 10 9 9 6 5 5 5 6 4 3 3 4 10 6 8 2 0 x8 59.7 55 65.6 61.4 66.1 61 66.1 58 57 58.9 67.5 57.2 58.8 58.6 59.2 54.4 49.6 54.3 58.7 51.7 61.9 52.7 57.8 59.7 54.9 65.3 43.8 53.5 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 Regression Analysis Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations b2, b1, b0 intercepts b2, b1, b0 Standard Error R Square, Standard Error F, Residual df Regression SS, Residual SS 0.7400 0.5477 0.5115 2.4323 28 Confidence level t Critical Value Half Width b0 Half Width b1 Half Width b2 ANOVA df Regression Residual Total Intercept x7 x8 2 25 27 SS 179.0662 147.8981 326.9643 Coefficients 17.9443 0.0484 -0.0065 Standard Error 9.8625 0.1192 0.0018 MS 89.5331 5.9159 F 15.1343 t Stat P-value 1.8195 0.0808 0.4057 0.6884 -3.7191 0.0010 Significance F 0.0000 Lower 95% Upper 95% -2.3678 38.2565 -0.1972 0.2939 -0.0102 -0.0029 Lower 95% -2.3678 -0.1972 -0.0102 Upper 95% 38.2565 0.2939 -0.0029 Calculations -0.0065 0.0484 17.9443 0.0018 0.1192 9.8625 0.5477 2.4323 #N/A 15.1343 25 #N/A 179.066187676 147.898098038 #N/A 95% 2.0595 20.3122 0.2455 0.0036 Confidence Interval Estimate and Prediction Interval Data Confidence Level x2 given value x7 given value x8 given value X'X Inverse of X'X X'G times Inverse of X'X [X'G times Inverse of X'X] times XG t Statistic Predicted Y (YHat) 95% 1 2300 56 2100 28 59562 1628.4 59084 59562 1.3E+008 3449654 1.3E+008 1628.4 3449654 95487.38 3399713 59084 1.3E+008 3399713 1.3E+008 21.44219 -0.000911 -0.227778 -0.002949 -0.000911 1.66E-007 6.71E-006 7.94E-008 -0.227778 6.71E-006 0.002674 2.75E-005 -0.002949 7.94E-008 2.75E-005 5.60E-007 0.39959 1.34E-005 -0.004887 -5.1E-005 0.049088 2.063899 7.216424 For Average Predicted Y (YHat) Interval Half Width 0.780221 Confidence Interval Lower Limit 6.436203 Confidence Interval Upper Limit 7.996645 For Individual Response Y Interval Half Width 3.6069 Prediction Interval Lower Limit 3.609523 Prediction Interval Upper Limit 10.82332 x2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x7 1985 2855 1737 2905 1666 2927 2341 2737 1414 1838 1480 2191 2229 2204 2140 1730 2072 2929 2268 1983 1792 1606 1492 2835 2416 1638 2649 1503 x8 59.7 55 65.6 61.4 66.1 61 66.1 58 57 58.9 67.5 57.2 58.8 58.6 59.2 54.4 49.6 54.3 58.7 51.7 61.9 52.7 57.8 59.7 54.9 65.3 43.8 53.5 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 y x2 10 11 11 13 10 11 10 11 4 2 7 10 9 9 6 5 5 5 6 4 3 3 4 10 6 8 2 0 x7 1985 2855 1737 2905 1666 2927 2341 2737 1414 1838 1480 2191 2229 2204 2140 1730 2072 2929 2268 1983 1792 1606 1492 2835 2416 1638 2649 1503 x8 59.7 55 65.6 61.4 66.1 61 66.1 58 57 58.9 67.5 57.2 58.8 58.6 59.2 54.4 49.6 54.3 58.7 51.7 61.9 52.7 57.8 59.7 54.9 65.3 43.8 53.5 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 Regression Analysis Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations b3 through b0 intercepts b3 through b0 Standard Error R Square, Standard Error F, Residual df Regression SS, Residual SS 0.8867 0.7863 0.7596 1.7062 28 Confidence level t Critical Value Half Width b0 Half Width b1 Half Width b2 Half Width b3 ANOVA df Regression Residual Total Intercept x2 x7 x8 3 24 27 SS 257.0943 69.8700 326.9643 Coefficients -1.8084 0.0036 0.1940 -0.0048 Standard Error 7.9009 0.0007 0.0882 0.0013 MS 85.6981 2.9113 F 29.4369 t Stat P-value -0.2289 0.8209 5.1771 0.0000 2.1983 0.0378 -3.7710 0.0009 Significance F 0.0000 Lower 95% Upper 95% -18.1149 14.4982 0.0022 0.0050 0.0119 0.3761 -0.0075 -0.0022 Lower 95% -18.1149 0.0022 0.0119 -0.0075 Upper 95% 14.4982 0.0050 0.3761 -0.0022 Calculations -0.0048 0.1940 0.0036 0.0013 0.0882 0.0007 0.7863 1.7062 #N/A 29.4369 24 #N/A 257.094281533 69.8700041817 #N/A 95% 2.0639 16.3066 0.0014 0.1821 0.0026 -1.8083721 7.9008594 #N/A #N/A #N/A Multiple linear regression questions. Use SAS or R, include your codes of each question. Data set: y x x2 x3 x x x x 1 4 5 6 7 8 10 2113 1985 38.9 64.7 4 868 59.7 11 2003 2855 38.8 61.3 3 615 55 11 2957 1737 40.1 60 14 914 65.6 13 2285 2905 41.6 45.3 -4 957 61.4 10 2971 1666 39.2 53.8 15 836 66.1 11 2309 2927 39.7 74.1 8 786 61 10 2528 2341 38.1 65.4 12 754 66.1 11 2147 2737 37 78.3 -1 761 58 4 1689 1414 42.1 47.6 -3 714 57 2 2566 1838 42.3 54.2 -1 797 58.9 7 2363 1480 37.3 48 19 984 67.5 10 2109 2191 39.5 51.9 6 700 57.2 9 2295 2229 37.4 53.6 -5 1037 58.8 9 1932 2204 35.1 71.4 3 986 58.6 6 2213 2140 38.8 58.3 6 819 59.2 5 1722 1730 36.6 52.6 -19 791 54.4 5 1498 2072 35.3 59.3 -5 776 49.6 5 1873 2929 41.1 55.3 10 789 54.3 6 2118 2268 38.2 69.6 6 582 58.7 4 1775 1983 39.3 78.3 7 901 51.7 3 1904 1792 39.7 38.1 -9 734 61.9 3 1929 1606 39.7 68.8 -21 627 52.7 4 2080 1492 35.5 68.8 -8 722 57.8 10 2301 2835 35.3 74.1 2 683 59.7 6 2040 2416 38.7 50 0 576 54.9 8 2447 1638 39.9 57.1 -8 848 65.3 2 1416 2649 37.4 56.3 -22 684 43.8 0 1503 1503 39.3 47 -9 875 53.5 x x 9 2205 2096 1847 1903 1457 1848 1564 1821 2577 2476 1984 1917 1761 1709 1901 2288 2072 2861 2411 2289 2203 2592 2053 1979 2048 1786 2876 2560 1917 1575 2175 2476 1866 2339 2092 1909 2001 2254 2217 1758 2032 2025 1686 1835 1914 2496 2670 2202 1988 2324 2550 2110 2628 1776 2524 2241 1) Fit the 1st multiple linear regression model to those data using X 2, X7,X8. Construct the analysis-of-variance table and test for significance of regression. SAS Code: proc reg data = a; model y = x2 x7 x8; output out=results r=residual p=yhat; run; SAS Output: Result: We can see that the p-value of the ANOVA table is smaller than 0.05 so we can conclude that this model is statistically significant at 5% level of significance. Regression equation is Y = -1.8084+0.0036*X 2+0.1940*X7-0.0048*X8 2) Calculate t statistics for testing the hypotheses H 0: 2 = 0, H0: 7 = 0, and H0: 8= 0. What conclusions can you draw about the roles the variables X 2, X7, and X8 play in the model? Calculate R2 and R2Adj for this model. SAS Code: proc reg data = a; model y = x2 x7 x8; output out=results r=residual p=yhat; run; SAS Output: Result: We can see that the p-value of all coefficient of x2, x7 and x8 are smaller than 0.05 level of significance so we can say that the variables X 2, X7, and X8 plays significant role in the model. The R2 value is 0.7863 and R2Adj value is 0.7596. 3) Using the partial F test, determine the contribution of X 7 to the model. How is this partial F statistic related to the t test for 7 calculated in part 2) above? SSE ( R ) SSE ( F) df R df F F* SSE ( F) df F SAS Code: proc reg data = a; model y = x7; output out=results r=residual p=yhat; run; SAS Output: Results: F = ((229.72614-69.87)/(26-24))/( 69.87/24)= 27.4549 Partial F statistic related to the t test for 7 calculated in part 2 is 27.4549 which is very high, it means this coefficient is statistically significant in the model to predict the dependent variable. Note: I am not sure about your answer, but I am sure that above answer could not be wrong. 4) Show numerically that the square of the simple correlation coefficient between the observed values yi and the fitted values i equals R2 SAS Code: proc corr data = results; var yhat y; run; SAS Output: The below are the actual and predicted values: Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Predicted 6.295058 9.038655 8.271048 11.38929 9.990608 11.65572 11.90405 10.52022 1.925503 4.305974 7.055146 7.938222 9.136502 9.258164 8.219689 3.949875 Actual yi 10 11 11 13 10 11 10 11 4 2 7 10 9 9 6 5 17 18 19 20 21 22 23 24 25 26 27 28 5.289552 5.485287 6.12736 4.331678 6.036974 1.710071 4.88464 10.44172 7.670849 8.150396 2.369012 1.648734 5 5 6 4 3 3 4 10 6 8 2 0 Results: Correlation between actual and predicted values is 0.88673949, The square of this value is 0.78630692, which is equal to R2. So we can conclude that the square of the simple correlation coefficient between the observed values yi and the fitted values i equals R2. 5) Find a 95% CI on 7 And Find a 95% CI on the mean number of games won by a team when X2 = 2300, X7= 56.0, and X8 = 2100 SAS Code: data a1; input x2 x7 x8; datalines; 2300 56 2100 ; run; data a2; set a a1; run; proc reg data = a2; model y = x2 x7 x8 / alpha = 0.05 clm clb; output out=results r=residual p=yhat; run; SAS Output: Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 95% CL Mean Residual 1 10.0000 6.2951 0.3944 5.4811 7.1091 3.7049 2 11.0000 9.0387 0.5919 7.8171 10.2602 1.9613 3 11.0000 8.2710 0.5851 7.0635 9.4786 2.7290 4 13.0000 11.3893 0.6817 9.9824 12.7962 1.6107 Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 0.7481 95% CL Mean Residual 5 10.0000 9.9906 8.4467 11.5345 0.009392 6 11.0000 11.6557 0.6787 10.2549 13.0566 -0.6557 7 10.0000 11.9040 0.6497 10.5632 13.2449 -1.9040 8 11.0000 10.5202 0.5849 9.3130 11.7275 0.4798 9 4.0000 1.9255 0.7124 0.4552 3.3958 2.0745 10 2.0000 4.3060 0.5942 3.0795 5.5324 -2.3060 11 7.0000 7.0551 0.7903 5.4240 8.6863 -0.0551 12 10.0000 7.9382 0.4420 7.0259 8.8505 2.0618 13 9.0000 9.1365 0.5111 8.0817 10.1913 -0.1365 14 9.0000 9.2582 0.5740 8.0736 10.4427 -0.2582 15 6.0000 8.2197 0.3850 7.4250 9.0143 -2.2197 16 5.0000 3.9499 0.5103 2.8966 5.0032 1.0501 17 5.0000 5.2896 0.8658 3.5027 7.0764 -0.2896 18 5.0000 5.4853 1.0694 3.2781 7.6925 -0.4853 19 6.0000 6.1274 0.5587 4.9743 7.2804 -0.1274 20 4.0000 4.3317 0.5749 3.1451 5.5182 -0.3317 21 3.0000 6.0370 0.5230 4.9575 7.1164 -3.0370 22 3.0000 1.7101 0.6482 0.3723 3.0478 1.2899 23 4.0000 4.8846 0.5780 3.6917 6.0776 -0.8846 24 10.0000 10.4417 0.6000 9.2035 11.6800 -0.4417 25 6.0000 7.6708 0.4695 6.7018 8.6399 -1.6708 26 8.0000 8.1504 0.5953 6.9217 9.3791 -0.1504 27 2.0000 2.3690 0.9641 0.3792 4.3588 -0.3690 28 0 1.6487 0.6631 0.2801 3.0174 -1.6487 Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 29 . 7.2164 0.3780 95% CL Mean 6.4362 Residual 7.9966 Results: 95% confidence interval on 7 is (0.001186, 0.37606). Predicted values for: y x2 2,300 x7 56 x8 2,100 Predicted 7.216 95% Confidence Interval lower upper 6.436 7.997 . 6) Now fit a 2nd model to these data using only x7 and X 8 as the regressors and Test for significance of regression SAS Code: proc reg data = a; model y = x7 x8; *output out=results r=residual p=yhat; run; SAS Output: Results: We can see that the p-value of the ANOVA table is smaller than 0.05 so we can conclude that this model is statistically significant at 5% level of significance. We can see that the p-value of all coefficient of x8 is smaller than 0.05 level of significance so we can say that the variables X8 plays significant role in the model but variable X 7 does not plays significant role in the model as it is not significant at 5% level. Regression equation is Y = 17.94432+0.04837*X7-0.00654*X8 7) Calculate R2 and R2Adj. How do these quantities compare to the values computed for the 1st model, which included an additional regressor (X2)? Results: The R2 value is 0.5477 and R2Adj value is 0.5115. These are smaller as compare to the first model. We can say that model 1 is better as compare to model 2 with X 7 and X8 as independent variables. 8) Calculate a 95% CI on 7. Also find a 95% CI on the mean number of games won by a team when X7 = 56.0 and X8 = 2100. Compare the lengths of these CIs to the lengths of the corresponding CIs from part 5) . What conclusions can you draw from this problem about the consequences of omitting an important regressor from a model? SAS Code: data a3; input x7 x8; datalines; 56 2100 ; run; data a4; set a a2; run; proc reg data = a4; model y = x7 x8 / alpha = 0.05 CLB clm; run; SAS Output: Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 95% CL Mean Residual 1 10.0000 6.4189 0.5612 5.2631 7.5746 3.5811 2 11.0000 6.9040 0.6053 5.6574 8.1507 4.0960 3 11.0000 9.0444 0.8064 7.3836 10.7052 1.9556 4 13.0000 8.4752 0.5481 7.3462 4.5248 9.6041 Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 95% CL Mean Residual 5 10.0000 11.6178 0.9677 9.6248 13.6108 6 11.0000 8.8153 0.5696 7.6421 9.9885 2.1847 7 10.0000 10.9184 0.8854 9.0948 12.7420 -0.9184 8 11.0000 8.8467 0.6949 7.4155 10.2779 2.1533 9 4.0000 3.8567 0.8652 2.0748 5.6385 0.1433 10 2.0000 4.6088 0.8430 2.8726 6.3449 -2.6088 11 7.0000 8.2408 1.0783 6.0199 10.4616 -1.2408 12 10.0000 8.1805 0.6266 6.8900 9.4710 1.8195 13 9.0000 9.2776 0.7275 7.7792 10.7759 -0.2776 14 9.0000 9.6078 0.8125 7.9345 11.2812 -0.6078 15 6.0000 8.3818 0.5470 7.2552 9.5084 -2.3818 16 5.0000 5.6200 0.5637 4.4589 6.7810 -0.6200 17 5.0000 6.7997 1.1620 4.4065 9.1929 -1.7997 18 5.0000 1.8697 1.1545 -0.5081 4.2474 3.1303 19 6.0000 5.0240 0.7362 3.5078 6.5402 0.9760 20 4.0000 5.4828 0.7557 3.9263 7.0393 -1.4828 21 3.0000 6.5384 0.7327 5.0294 8.0473 -3.5384 22 3.0000 3.5506 0.7726 1.9594 5.1418 -0.5506 23 4.0000 7.3205 0.4786 6.3349 8.3062 -3.3205 24 10.0000 7.8961 0.4901 6.8869 8.9054 2.1039 25 6.0000 7.2129 0.6573 5.8591 8.5668 -1.2129 26 8.0000 9.4286 0.7722 7.8382 11.0190 -1.4286 27 2.0000 1.2637 1.3402 -1.4965 4.0240 0.7363 28 0 3.7985 0.7370 5.3165 -3.7985 2.2805 -1.6178 Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 29 . 6.9262 0.5329 95% CL Mean 5.8286 Residual 8.0238 . Results: 95% confidence interval on 7 is (-0.1972, 0.2939). Predicted values for: y x7 56 x8 2,100 Predicted 6.926 95% Confidence Interval lower upper 5.829 8.024 We can see that the length of these Cis is increased as compare to Cis in part 5. It means the omission of X2 variable should not be done because that is the most important variable in predicting the dependent variable

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Managerial Accounting

Authors: Ray H. Garrison, Eric W. Noreen, Peter C. Brewer

13th Edition

978-0073379616, 73379611, 978-0697789938

Students also viewed these Mathematics questions