Please help
Question 2. We consider the data (prostate.txt) from the study of Stamey (1989). It was a study on 97 men with prostate cancer who were about to receive a radical prostatectomy (an operation). The relationship between the level of prostate-specific antigen and a number of clinical measures were studied. Variable Meaning X1 1cavol log (cancer volume) X2 1weight log (prostate weight) X3 age age x4 1bph log (benign prostatic hyperplasia amount) x5 svi seminal vesicle invasion x6 1cp log (capsular penetration) X7 gleason Gleason score X8 pgg45 percentage Gleason scores 4 or 5 Y 1psa log (prostate specific antigen) (a) [20 marks] Consider the full model Y = Bo+ BIXi+ ...+ BXs + Error . Here, the error terms are assumed to be independent and identically distributed random variables N(0, 32) . Suppose that the value of X of a new patient is given as follows X1 = 1. 1474025, X2 = 3.4194, X3 = 59, X4 = -1.386294, X5 = 0, X6 = -1.38629, X7 = 6, X8 = 0 You are interested in predicting Y , the logarithm of amount of prostate specific antigen. Give the predicted value of Y and express the variance of prediction error in terms of o'. (b) [20 marks] Now, you do not want a model with eight predictors as the prediction error is not satisfactory. You prefer a model with only five predictors. Select a model with BACKWARD selection approach. Report the t statistic of each remained variables in each step. (c) [20 marks] Now, you prefer a model with only four predictors. Select a model with FORWARD selection approach. Report the t statistic of each unselected variables in each step. (d) [20 marks] Consider the reduced model obtained in part (c). You are interested in predicting Y , the logarithm of amount of prostate specific antigen. The value of X for a new patient is given in (a). Give the predicted value of Y and express the variance of prediction error in terms of o' . Comparing (a) and (d), which model tends to give smaller estimation error