Consider again Example5 10, where we train the learner via the Newton iteration (5 39) If ( mathbf X top left x 1 , ldots, boldsymbol x n ight ) defines the matrix of predictors and ( boldsymbol mu t boldsymbol h left( boldsymbol X boldsymbol beta t ight) ), then the gradient (5 37) and Hessian (5 ...

Consider again Example5.10, where we train the learner via the Newton iteration (5.39). If (mathbf{X}^{top}:=left[x_{1}, ldots, boldsymbol{x}_{n}

Question:

Consider again Example5.10, where we train the learner via the Newton iteration (5.39). If \(\mathbf{X}^{\top}:=\left[x_{1}, \ldots, \boldsymbol{x}_{n}\right]\) defines the matrix of predictors and \(\boldsymbol{\mu}_{t}:=\boldsymbol{h}\left(\boldsymbol{X} \boldsymbol{\beta}_{t}\right)\), then the gradient (5.37) and Hessian (5.38) for Newton's method can be written as:

\[ abla r_{\tau}\left(\boldsymbol{\beta}_{t}\right)=\frac{1}{n} \mathbf{X}^{\top}\left(\boldsymbol{\mu}_{t}-\boldsymbol{y}\right) \text { and } \mathbf{H}\left(\boldsymbol{\beta}_{t}\right)=\frac{1}{n} \mathbf{X}^{\top} \mathbf{D}_{t} \mathbf{X} \]

where \(\mathbf{D}_{t}:=\operatorname{diag}\left(\boldsymbol{u}_{\mathbf{t}} \odot\left(\mathbf{1}-\boldsymbol{\mu}_{t}\right)\right)\) is a diagonal matrix. Show that the Newton iteration (5.39) can be written as the iterative reweighted least-squares method:
\[ \boldsymbol{\beta}_{t}=\underset{\boldsymbol{\beta}}{\operatorname{argmin}}\left(\tilde{y}_{t-1}-\mathbf{X} \boldsymbol{\beta}\right)^{\top} \mathbf{D}_{t-1}\left(\tilde{y}_{t-1}-\mathbf{X} \boldsymbol{\beta}\right), \]

where \(\widetilde{\boldsymbol{y}}_{t-1}:=\mathbf{X} \boldsymbol{\beta}_{t-1}+\mathbf{D}_{t-1}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_{t-1}\right)\) is the so-called adjusted response. [Hint: use the fact that \(\left(\mathbf{M}^{\top} \mathbf{M}\right)^{-1} \mathbf{M}^{\top} z\) is the minimizer of \(\|\mathbf{M} \boldsymbol{\beta}-z\|^{2}\).]

Fantastic news! We've Found the answer you've been seeking!