Consider a data set consisting of n observations, n c complete and n m incomplete, for which
Question:
Consider a data set consisting of n observations, nc complete and nm incomplete, for which the dependent variable, yi, is missing. Data on the independent variables, xi, are complete for all n observations, Xc and Xm. We wish to use the data to estimate the parameters of the linear regression model y = Xβ + ε. Consider the following the imputation strategy: Step 1: Linearly regress yc on Xc and compute bc. Step 2: Use Xm to predict the missing ym with Xmbc. Then regress the full sample of observations, (yc,Xmbc), on the full sample of regressors, (Xc,Xm).
a. Show that the first and second step least squares coefficient vectors are identical.
b. Is the second step coefficient estimator unbiased?
c. Show that the sum of squared residuals is the same at both steps.
d. Show that the second step estimator of σ2 is biased downward.
Step by Step Answer: