Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

4 STAT W4315: Linear Regression Models Multiple Linear Regression Model DUE: Wednesday, April, 13, 12:00 noon (1) Please sign your home work with your name

4 STAT W4315: Linear Regression Models Multiple Linear Regression Model DUE: Wednesday, April, 13, 12:00 noon (1) Please sign your home work with your name and UNI number. (2) Homework must be submitted into the Statistics Homework Boxes room 904 on the 9th oor of SSW building. (3) Homework is due Wednesday, April 13, 12:00 noon. (4) No late homework, under any circumstances, will be accepted. (5) At the end of semester, one of your lowest homework scores will be dropped before the nal grade is calculated. Problem 1 (50p) (Problems 6.18 (b-f ), 6.21, & 7.7 in ALRM book) A commercial real estate company evaluates vacancy rates, square footage, rental rates, and operating expenses for commercial properties in a large metropolitan area in order to provide clients with quantitative information upon which to make rental decisions. The data are taken from 81 suburban commercial properties that are the newest, best located, most attractive, and expensive for ve specic geographic areas. The variables are: rental rates Y , the age X1 , operating expanses and taxes X2 , vacancy rates X3 , and total square footage X4 . Y X1 X2 13.500 1 5.02 12.000 14 8.19 10.500 16 3.00 15.000 4 10.70 . . . . . . . . . X3 X4 0.14 123, 000 0.27 104, 079 0.00 39, 998 0.05 57, 112 . . . . . . (see le \"Homework 4 data Problem1.txt\" for a complete set of data) (a)(5p) Obtain the scatter plot matrix and the correlation matrix. Interpret these and state your principal ndings. (b)(5p) Fit regression model Yi = 0 + 1 Xi1 + 2 Xi2 + 3 Xi3 + 4 Xi4 + i , for four predictor variables to the data and state the estimated regression function. (c)(5p) Obtain the residuals and prepare a box plot of the residuals. Does the distribution appear to be fairly symmetrical? (d)(5p) Plot the residuals against Y , each predictor variable, and each two-factor interaction term on separate graphs. Also prepare a normal probability plot. Analyse your plots and summarize your ndings. (e)(5p) Can you conduct a formal test for lack of t here? (f)(10p) The commercial real estate company obtained information about additional three properties. 1 X1 X2 X3 X4 2 3 : 4.0 6.0 12.0 : 10.0 11.5 12.5 : 0.10 0 0.32 : 80, 000 120, 000 340, 000 Find separate prediction intervals for the rental rates for each of the new properties. Use 95% condence coecient in each case. Can the rental rates of these three properties be predicted fairly precisely? What is the family condence level for the set of three predictions? (g)(10p) Obtain the analysis of variance table that decomposes the regression sum of squares into extra sums of squares associated with X4 ; with X1 given X4 ; with X2 given X1 2 and X4 ; and with X3 , given X1 , X2 and X4 . (h)(5p) Test whether X3 can be dropped from the regression model given that X1 , X2 , and X4 are retained. Use the F test statistic and level of signicance 0.01. State the alternatives decision rule, and conclusion. What is the p-value of the test? Problem 2 (50p) (Problems 8.15 & 8.19 in ALRM book) The users of the copiers are either training institutions that use a small model, or business rms that use a large, commercial model. An analyst at Tri-City wishes to t a regression model including both number of copiers serviced (X1 ) and type of copier (X2 ) as predictor variables and estimate the eect of copier model (S-small, L-large) on number of minutes spent on the service call. Assume that the regression model Yi = 0 + 1 Xi1 + 2 Xi2 + i is appropriate, and let X2 = 1 if small model and 0 if large, commercial model. Y X 1 X2 20 2 60 4 46 3 41 2 12 1 137 10 . . . . . . 1 0 0 0 0 0 . . . (see le \"Homework 4 data Problem2.txt\" for a complete set of data) (a)(5p) Explain the meaning of all regression coecients in the model. (b)(5p) Fit the regression model and state the estimated regression function. (c)(5p) Estimate the eect of copier model on mean service time with a 95 percent condence interval. Interpret your interval estimate. (d)(10p) Why would the analyst wish to include X1 , number of copiers, in the regression model when interest is in estimating the eect of type of copier model on service time? (e)(10p) Obtain the residuals and plot them against X1 X2 . Is there any indication that an interaction term in the regression model would be helpful? 3 (f)(5p) Fit regression model with interaction term as an additional explanatory variable, i.e., Yi = 0 + 1 Xi1 + 2 Xi2 + 3 Xi1 Xi2 + i (g)(10p) Test whether the interaction term can be dropped from the model; control the risk at 0.10. State the alternatives, decision rule, and conclusion. What is the p-value of the test? If the interaction the cannot be dropped from the model, describe the nature of the interaction eect. 4 13.500 1 5.02 0.14 123000 12.000 14 8.19 0.27 104079 10.500 16 3.00 0.00 39998 15.000 4 10.70 0.05 57112 14.000 11 8.97 0.07 60000 10.500 15 9.45 0.24 101385 14.000 2 8.00 0.19 31300 16.500 1 6.62 0.60 248172 17.500 1 6.20 0.00 215000 16.500 8 11.78 0.03 251015 17.000 12 14.62 0.08 291264 16.500 2 11.55 0.03 207549 16.000 2 9.63 0.00 82000 16.500 13 12.99 0.04 359665 17.225 2 12.01 0.03 265500 17.000 1 12.01 0.00 299000 16.000 1 7.99 0.14 189258 14.625 12 10.33 0.12 366013 14.500 16 10.67 0.00 349930 14.500 3 9.45 0.03 85335 16.500 6 12.65 0.13 235932 16.500 3 12.08 0.00 130000 15.000 3 10.52 0.05 40500 15.000 3 9.47 0.00 40500 13.000 14 11.62 0.00 45959 12.500 1 5.00 0.33 120000 14.000 15 9.89 0.05 81243 13.750 16 11.13 0.06 153947 14.000 2 7.96 0.22 97321 15.000 16 10.73 0.09 276099 13.750 2 7.95 0.00 90000 15.625 3 9.10 0.00 184000 15.625 3 12.05 0.03 184718 13.000 16 8.43 0.04 96000 14.000 16 10.60 0.04 106350 15.250 13 10.55 0.10 135512 16.250 1 5.50 0.21 180000 13.000 14 8.53 0.03 315000 14.500 3 9.04 0.04 42500 11.500 15 8.20 0.00 30005 14.250 1 6.13 0.00 60000 15.500 15 8.32 0.00 73521 12.000 1 4.00 0.00 50000 14.250 15 10.10 0.00 50724 14.000 3 5.25 0.16 31750 16.500 3 11.62 0.00 168000 14.500 4 5.31 0.00 70000 15.500 1 5.75 0.00 27000 16.750 4 12.46 0.03 129614 16.750 4 12.75 0.00 129614 16.750 2 12.75 0.00 130000 16.750 2 11.38 0.00 209000 17.000 1 5.99 0.57 220000 16.000 2 11.37 0.27 60000 14.500 3 10.38 0.00 110000 15.000 15 10.77 0.05 101206 15.000 17 11.30 0.00 288847 16.000 1 7.06 0.14 105000 15.500 14 12.10 0.05 276425 15.250 2 10.04 0.06 33000 16.500 1 4.99 0.73 210000 19.250 0 7.33 0.22 240000 17.750 18 12.11 0.00 281552 18.750 16 12.86 0.00 421000 19.250 13 12.70 0.04 484290 14.000 20 11.58 0.00 234493 14.000 18 11.58 0.03 230675 18.000 16 12.97 0.08 296966 13.750 1 4.82 0.00 32000 15.000 2 9.75 0.03 38533 15.500 16 10.36 0.02 109912 15.900 1 8.13 0.23 236000 15.250 15 13.23 0.05 243338 15.500 4 10.57 0.04 122183 14.750 20 11.22 0.00 128268 15.000 3 10.34 0.00 72000 14.500 3 10.67 0.00 43404 13.500 18 8.60 0.08 59443 15.000 15 11.97 0.14 254700 15.250 11 11.27 0.03 434746 14.500 14 12.68 0.03 201930 20\t2\t1 60\t4\t0 46\t3\t0 41\t2\t0 12\t1\t0 137\t10\t0 68\t5\t1 89\t5\t1 4\t1\t1 32\t2\t1 144\t9\t1 156\t10\t0 93\t6\t0 36\t3\t0 72\t4\t1 100\t8\t0 105\t7\t0 131\t8\t0 127\t10\t0 57\t4\t0 66\t5\t0 101\t7\t1 109\t7\t1 74\t5\t0 134\t9\t1 112\t7\t0 18\t2\t0 73\t5\t0 111\t7\t0 96\t6\t0 123\t8\t1 90\t5\t1 20\t2\t1 28\t2\t0 3\t1\t1 57\t4\t1 86\t5\t0 132\t9\t0 112\t7\t0 27\t1\t0 131\t9\t1 34\t2\t1 27\t2\t0 61\t4\t0 77\t5\t0Homework 4 STAT W4315: Linear Regression Models Multiple Linear Regression Model sh is ar stu ed d vi y re aC s ou ou rc rs e eH w er as o. co m DUE: Wednesday, November, 18, 12:00 noon (1) Please sign your home work with your name and UNI number. (2) Homework must be submitted into the Statistics Homework Boxes room 904 on the 9th oor of SSW building. (3) Homework is due Wednesday, November 18, 12:00 noon. (4) No late homework, under any circumstances, will be accepted. (5) At the end of semester, one of your lowest homework scores will be dropped before the nal grade is calculated. Problem 1 (50p) (Problems 6.18 (b-f ), 6.21, & 7.7 in Th ALRM book) A commercial real estate company evaluates vacancy rates, square footage, rental rates, and operating expenses for commercial properties in a large metropolitan area in order to provide clients with quantitative information upon which to make rental decisions. The data are taken from 81 suburban commercial properties that are the newest, best located, most attractive, and expensive for ve specic geographic areas. The variables are: rental rates Y , the age X1 , operating expanses and taxes X2 , vacancy rates X3 , and total square https://www.coursehero.com/file/13218567/HW4solpdf/ footage X4 . Y X1 X2 13.500 1 5.02 12.000 14 8.19 10.500 16 3.00 15.000 4 10.70 . . . . . . . . . X3 X4 0.14 123, 000 0.27 104, 079 0.00 39, 998 0.05 57, 112 . . . . . . (see le \"Homework 4 data Problem1.txt\" for a complete set of data) (a)(5p) Obtain the scatter plot matrix and the correlation matrix. Interpret these and state your principal ndings. sh is ar stu ed d vi y re aC s ou ou rc rs e eH w er as o. co m - The predicted variable (rental rate Y ) is most positively correlated with the predictor X4 (total square footage), and it is negatively correlated with X1 (the age of the property). Signs of both correlations agree with the intuition. - There is no strong evidence of multicollinearity as the predictor variables are mildly correlated between one another. (b)(5p) Fit regression model Yi = 0 + 1 Xi1 + 2 Xi2 + 3 Xi3 + 4 Xi4 + i , for four predictor variables to the data and state the estimated regression function. (c)(5p) Obtain the residuals and prepare a box plot of the residuals. Does the distribution appear to be fairly symmetrical? - The distribution appear to be fairly symmetrical. Th (d)(5p) Plot the residuals against Y , each predictor variable, and each two-factor interaction term on separate graphs. Also prepare a normal probability plot. Analyse your plots and summarize your ndings. - Residuals against Y are mildly heteroskedastic. For the more extreme values of Y they are larger than for the average. But there is no evidence of nonlinearity. - The residuals against X1 - the age of the property - are fairly heteroskedastic. In particular, for X1 close to 0, the variance of the t is much larger than for average X1 . Also for the largest X1 the variance tends to be higher. It is an evidence that the newly build properties and very old properties have more diversity in the rental rates than the average age properties from the data. - The residuals against X2 are more homoskedastic and do not exhibit any patterns. https://www.coursehero.com/file/13218567/HW4solpdf/ 2 - The residuals against X3 are clearly heteroskedastic, with largest variance for the lowest values of vacancy rates. Most probably the low vacancy rate apartments are either permanently rented for a lower price or they are the highest class properties which are unique on the market always rented for a high price. - The residuals against X2 are more homoskedastic and do not exhibit any patterns. - The residuals against X1 X2 , X1 X3 , and X1 X4 are also more varying for small values of the interaction terms. - The residuals against X2 X3 , X2 X4 , and X3 X4 are more negative for larger values of the interaction terms. sh is ar stu ed d vi y re aC s ou ou rc rs e eH w er as o. co m - The normal probability plot of the residuals exhibits some evidence of nonnormality. (e)(5p) Can you conduct a formal test for lack of t here? - No, because there are no observations with the same values of the predictors. (f)(10p) The commercial real estate company obtained information about additional three properties. 1 X1 X2 X3 X4 2 3 : 4.0 6.0 12.0 : 10.0 11.5 12.5 : 0.10 0 0.32 : 80, 000 120, 000 340, 000 Th Find separate prediction intervals for the rental rates for each of the new properties. Use 95% condence coecient in each case. Can the rental rates of these three properties be predicted fairly precisely? What is the family condence level for the set of three predictions? (g)(10p) Obtain the analysis of variance table that decomposes the regression sum of squares into extra sums of squares associated with X4 ; with X1 given X4 ; with X2 given X1 and X4 ; and with X3 , given X1 , X2 and X4 . (h)(5p) Test whether X3 can be dropped from the regression model given that X1 , X2 , and X4 are retained. Use the F test statistic and level of signicance 0.01. State the alternatives decision rule, and conclusion. What is the p-value of the test? https://www.coursehero.com/file/13218567/HW4solpdf/ 3 Problem 2 (50p) (Problems 8.15 & 8.19 in ALRM book) The users of the copiers are either training institutions that use a small model, or business rms that use a large, commercial model. An analyst at Tri-City wishes to t a regression model including both number of copiers serviced (X1 ) and type of copier (X2 ) as predictor variables and estimate the eect of copier model (S-small, L-large) on number of minutes spent on the service call. Assume that the regression model Yi = 0 + 1 Xi1 + 2 Xi2 + i is appropriate, and let X2 = 1 if small model and 0 if large, commercial model. X 1 X2 sh is ar stu ed d vi y re aC s ou ou rc rs e eH w er as o. co m Y 20 2 60 4 46 3 41 2 12 1 137 10 . . . . . . 1 0 0 0 0 0 . . . (see le \"Homework 4 data Problem2.txt\" for a complete set of data) (a)(5p) Explain the meaning of all regression coecients in the model. - 0 is the intercept when X2 = 0, i.e., for large type of copier. The estimate of it has no meaning as the number of minutes cannot be negative. Th - 1 is the slope for the number of copiers serviced. It shows the increase in the average minutes when the number of copiers increases by one - keeping the size of the copier xed. - 2 is an expected additional time necessary to x a smaller copiers compared to large copiers. (b)(5p) Fit the regression model and state the estimated regression function. (c)(5p) Estimate the eect of copier model on mean service time with a 95 percent condence interval. Interpret your interval estimate. - The estimated condence interval is very wide in both directions around 0. Hence, the estimate of 2 can change sign depending on the input data. (d)(10p) Why would the analyst wish to include X1 , number of copiers, in the regression model when interest is in estimating the eect of type of copier model on service time? https://www.coursehero.com/file/13218567/HW4solpdf/ 4 - X1 is highly correlated with the predicted variable. Hence, it explains a lot of variation of Y . The variation in the estimates of the 2 coecient is very large (see the condence interval in (c) above). Without X1 the variation of the coecient on X2 would be even higher. (e)(10p) Obtain the residuals and plot them against X1 X2 . Is there any indication that an interaction term in the regression model would be helpful? - The plot of the residuals against X1 X2 indicates a positive trend relation. Hence, the interaction term would be helpful to explain the variation in Y . sh is ar stu ed d vi y re aC s ou ou rc rs e eH w er as o. co m (f)(5p) Fit regression model with interaction term as an additional explanatory variable, i.e., Yi = 0 + 1 Xi1 + 2 Xi2 + 3 Xi1 Xi2 + i Th (g)(10p) Test whether the interaction term can be dropped from the model; control the risk at 0.10. State the alternatives, decision rule, and conclusion. What is the p-value of the test? If the interaction the cannot be dropped from the model, describe the nature of the interaction eect. https://www.coursehero.com/file/13218567/HW4solpdf/ 5 Powered by TCPDF (www.tcpdf.org)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

An Introduction to Analysis

Authors: William R. Wade

4th edition

132296381, 978-0132296380

More Books

Students also viewed these Mathematics questions

Question

How can employee involvement measures motivate employees?

Answered: 1 week ago