Body fat data
PH245 Introduction to Multivariate Statistics Homework Set 2 Due date: October 25, Monday Problems: 1. The dataset "Data-HW2-Bodyfat.txt" contains the percentage of body fat, age, weight, height, and ten body circumference measurements (e.g., abdomen) for 252 men. Body fat, a measure of health, is estimated through an underwater weighing technique. Fitting body fat to the other measurements using multiple regression provides a convenient way of estimating body fat for men using only a scale and a measuring tape. The file "Data-HW2-Bodyfat-Readme.txt" has more information. Remove the two outliers as we discussed in class. (a) Fit a linear regression model with percent body fat using Siri's equation as the response, age, weight, height, and the ten body circumference mea- surements as the predictors. Present the summary of the linear regression fit. (b) Interpret the coefficient associated with the predictor, age. If one wishes to test the null hypothesis that this coefficient equals zero, what is the p-value of this test? If the significance level is set at 0.05, what is your conclusion of this hypothesis test? (c) Draw a residual plot, with the fitted values on the x-axis, and the residuals on the y-axis. Does the plot suggest any violation of the key assumptions of the linear model? What are those key assumptions? (d) Fit the model we discussed in class with only age, weight, height as the predictors. Test the null hypothesis that this reduced model is preferred versus the alternative hypothesis that the original full model is preferred, given the data. Use the significance level 0.05. (e) (Optional) Draw a plot of the Lasso solution path for the regression on age, weight, height, and the ten body circumference measurements. 2. The dataset "Data-HW2-Carseats. Rdata" contains the sales of children car seats at 400 different stores. The data frame contains 11 variables. We are interested in estimating the unit sales (in thousands) at each store using the rest of the variables. (a) Fit a linear regression model to predict Sales using Price, Urban, US, and write out the fitted model in equation form. Note that some of the variables are categorical. (b) Present the summary of the linear regression fit, and provide an interpre- tation of each coefficient in the model. (c) Is there any evidence of outliers or high leverage observations for this model?Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban US 9.50 138 73 11 276 120 Bad 42 17 Yes Yes N 11.22 111 48 16 260 83 Good 65 10 Yes Yes W 10.06 113 35 10 269 80 Medium 59 12 Yes Yes 4 7.40 117 100 4 466 97 Medium 55 14 Yes Yes UT 4.15 141 64 3 340 128 Bad 38 13 Yes No 6 10.81 124 113 13 501 128 ad 78 16 No Yes 6.63 115 105 0 45 108 Medium 71 15 Yes No 11.85 136 81 15 425 120 Good 67 10 Yes Yes 9 6.54 132 110 108 124 Medium 76 10 No No 10 4.69 132 113 0 131 124 Medium 76 17 No Yes 11 9.01 121 78 9 150 100 Bad 26 10 No Yes 12 11.96 117 94 4 503 94 Good 50 13 Yes Yes 13 3.98 122 35 2 393 136 Medium 62 18 Yes No 14 10.96 115 28 11 29 86 Good 53 18 Yes Yes 15 11.17 107 117 11 148 118 Good 52 18 Yes Yes 16 8.71 149 95 5 400 144 Medium 76 18 No No 17 7.58 118 32 0 284 110 Good 63 13 Yes No 18 12.29 147 74 13 251 131 Good 52 10 Yes Yes 19 13.91 110 110 0 408 68 Good 46 17 No Yes 20 8.73 129 76 16 58 121 Medium 69 12 Yes Yes 21 6.41 125 90 2 367 131 Medium 35 18 Yes Yes 22 12.13 134 29 12 239 109 Good 62 18 No Yes 23 5.08 128 46 6 497 138 Medium 42 13 Yes No 24 5.87 121 31 0 292 109 Medium 79 10 Yes No 25 10.14 145 119 16 294 113 Bad 42 12 Yes Yes 26 14.90 139 32 0 176 82 Good 54 11 No No