Consider a normal linear model (boldsymbol{Y}=mathbf{X} boldsymbol{beta}+varepsilon), where (mathbf{X}) is an (n times p) model matrix and
Question:
Consider a normal linear model \(\boldsymbol{Y}=\mathbf{X} \boldsymbol{\beta}+\varepsilon\), where \(\mathbf{X}\) is an \(n \times p\) model matrix and \(\varepsilon \sim \mathscr{N}\left(\mathbf{0}, \sigma^{2} \mathbf{I}_{n}\right)\). Exercise 12 shows that for any such model the \(i\)-th standardized residual \(E_{i} /\left(\sigma \sqrt{1-\mathbf{P}_{i i}}\right)\) has a standard normal distribution. This motivates the use of the leverage \(\mathbf{P}_{i i}\) to assess whether the \(i\)-th observation is an outlier depending on the size of the \(i\)-th residual relative to \(\sqrt{1-\mathbf{P}_{i i}}\) more robust approach is to include an estimate for \(\sigma\) using all data except the \(i\)-th observation. This gives rise to the studentized residual \(T_{i}\), defined as
\[ T_{i}:=\frac{E_{i}}{S_{-i} \sqrt{1-\mathbf{P}_{i i}}} \]
where \(S_{-i}\) is an estimate of \(\sigma\) obtained by fitting all the observations except the \(i\)-th and \(E_{i}=Y_{i}-\widehat{Y}_{i}\) is the \(i\)-th (random) residual. Exercise 12 shows that we can take, for example, \[ \begin{equation*} S_{-i}^{2}=\frac{1}{n-1-p} \boldsymbol{Y}_{-i}-\mathbf{X}_{-i} \widehat{\boldsymbol{\beta}}_{-i}{ }^{2} \tag{5.45} \end{equation*} \]
where \(\mathbf{X}_{-i}\) is the model matrix \(\mathbf{X}\) with the \(i\)-th row removed, is an unbiased estimator of \(\sigma^{2}\). We wish to compute \(S_{-i}^{2}\) efficiently, using \(S^{2}\) in (5.44), as the latter will typically be available once we have fitted the linear model. To this end, define \(\boldsymbol{u}_{i}\) as the \(i\)-th unit vector \([0, \ldots, 0,1,0\), \(\ldots, 0]^{\top}\), and let \[ \boldsymbol{Y}^{(i)}:=\boldsymbol{Y}-\left(Y_{i}-\widehat{Y_{-i}}\right) \boldsymbol{u}_{i}=\boldsymbol{Y}-\frac{E_{i}}{1-\mathbf{P}_{i i}} \boldsymbol{u}_{i} \]
where we have used the fact that \(Y_{i}-\widehat{Y}_{-i}=E_{i} /\left(1-\mathbf{P}_{i i}\right)\), as derived in the proof of Theorem 5.1. Now apply Exercise 11 to prove that
\[ S_{-i}^{2}=\frac{(n-p) S^{2}-E_{i}^{2} /\left(1-\mathbf{P}_{i i}\right)}{n-p-1} \]
Step by Step Answer:
Data Science And Machine Learning Mathematical And Statistical Methods
ISBN: 9781118710852
1st Edition
Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev