Answered step by step
Verified Expert Solution
Question
1 Approved Answer
= = = - (XTX) Tiyi - Y), Q3. Consider the multiple linear regression model given by Y = X8+ where Y nx 1 vector
= = = - (XTX) Tiyi - Y), Q3. Consider the multiple linear regression model given by Y = X8+ where Y nx 1 vector of the dependent variables, X nx (p+1) design matrix of full rank, B (p+1) x1 vector of regression coefficients, nx 1 vector of random errors satisfying e~ N(0,1,02), and In is the n x n identity matrix. Given the vector y of the n observations, the least squares estimator of B is given by @= (x+x)-'X'y leading to the fitted model = X. Now consider removing observation i from the data. Let X() be the (n 1) (p+1) matrix X with row i deleted. Let Y() be the (n 1) x 1 vector y with observation yi deleted. Let Bo be the estimate of B with observation i deleted, and let x] be the th row of X. Thus, X7 X(i) = XTX 2;2] is of order (p+1)x(p+1), and X7,4(1) = X"y = Xty-Riyi is of order (p+1) x 1. It can be shown that you do not need to prove this) S - 1-hi where hii is the ith diagonal element of H = X(XTX)-1XT. Let SSE =y"{In - X(XTX)-'XT}y denote the residual sum of squares based on all n data points. Further, let SSE(8) = 47,{In-1 X(1)(X7X())-X)}() denote the residual sum of squares when the ith data point is deleted. (b) In a study of the effects of cystic fibrosis, data were collected from 25 patients on variables related to body size and lung function. = = = maximal static expiratory pressure (cm H20), a measure of malnutrition; X body mass (weight/height?) as a percentage of the age-specific median in normal individuals; X2 weight (kg); X3 residual volume; X4 forced expiratory volume in one second. Using the statistical computing package R, a multiple linear regression model con- taining only the four variables, X1, X2, X3, X4, has been fitted to the cystic fibrosis data and has been followed by an analysis of the effect of deleting the ith observation in turn from the data for i = 1,..., 25. The results are presented on the next page. Here, "hat" and "resid" contain the values hi from the matrix H, and the residuals, ei, (i = 1,..., n), from the multiple linear regression model based on all n = 25 observations. Also, sigmahat" is the residual standard error for the model with the ith obser- vation deleted. The remaining five columns contain the change in B; effected by deleting the ith observation for a model containing X1, X2, X3, X4. At the bottom of the table are Bi, (j = 0,1,. 4), together with their respective standard errors, and finally, the residual standard error, "sigmahat" for the model based on all n observations. Comment on the hat values and the residuals for the individual observations. Which observations have most influence, that is, effect most change, on the residual standard error and the Bi, (j = 0,1,...,4), and why? [8 marks] (c) For any ONE row in the table below, show numerically how the "hat", "resid and "sigmahat" values are related to "sigma_hat for all data". [3 marks]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started