Consider again Example5.10, where we train the learner via the Newton iteration (5.39). If (mathbf{X}^{top}:=left[x_{1}, ldots, boldsymbol{x}_{n}
Question:
Consider again Example5.10, where we train the learner via the Newton iteration (5.39). If \(\mathbf{X}^{\top}:=\left[x_{1}, \ldots, \boldsymbol{x}_{n}\right]\) defines the matrix of predictors and \(\boldsymbol{\mu}_{t}:=\boldsymbol{h}\left(\boldsymbol{X} \boldsymbol{\beta}_{t}\right)\), then the gradient (5.37) and Hessian (5.38) for Newton's method can be written as:
\[ abla r_{\tau}\left(\boldsymbol{\beta}_{t}\right)=\frac{1}{n} \mathbf{X}^{\top}\left(\boldsymbol{\mu}_{t}-\boldsymbol{y}\right) \text { and } \mathbf{H}\left(\boldsymbol{\beta}_{t}\right)=\frac{1}{n} \mathbf{X}^{\top} \mathbf{D}_{t} \mathbf{X} \]
where \(\mathbf{D}_{t}:=\operatorname{diag}\left(\boldsymbol{u}_{\mathbf{t}} \odot\left(\mathbf{1}-\boldsymbol{\mu}_{t}\right)\right)\) is a diagonal matrix. Show that the Newton iteration (5.39) can be written as the iterative reweighted least-squares method:
\[ \boldsymbol{\beta}_{t}=\underset{\boldsymbol{\beta}}{\operatorname{argmin}}\left(\tilde{y}_{t-1}-\mathbf{X} \boldsymbol{\beta}\right)^{\top} \mathbf{D}_{t-1}\left(\tilde{y}_{t-1}-\mathbf{X} \boldsymbol{\beta}\right), \]
where \(\widetilde{\boldsymbol{y}}_{t-1}:=\mathbf{X} \boldsymbol{\beta}_{t-1}+\mathbf{D}_{t-1}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_{t-1}\right)\) is the so-called adjusted response. [Hint: use the fact that \(\left(\mathbf{M}^{\top} \mathbf{M}\right)^{-1} \mathbf{M}^{\top} z\) is the minimizer of \(\|\mathbf{M} \boldsymbol{\beta}-z\|^{2}\).]
Step by Step Answer:
Data Science And Machine Learning Mathematical And Statistical Methods
ISBN: 9781118710852
1st Edition
Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev