Projection pursuit is a network with one hidden layer that can be written as: [ g(boldsymbol{x})=Sleft(boldsymbol{omega}^{top} boldsymbol{x}
Question:
Projection pursuit is a network with one hidden layer that can be written as:
\[ g(\boldsymbol{x})=S\left(\boldsymbol{\omega}^{\top} \boldsymbol{x}\right) \]
where \(S\) is a univariate smoothing cubic spline. If we use squared-error loss with \(\tau_{n}=\left\{y_{i}, \boldsymbol{x}_{i}\right\}_{i=1}^{n}\), we need to minimize the training loss:
\[ \begin{equation*} \frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-S\left(\boldsymbol{\omega}^{\top} \boldsymbol{x}_{i}\right)\right)^{2} \tag{235} \end{equation*} \]
with respect to \(\omega\) and all cubic smoothing splines. This training of the network is typically tackled iteratively in a manner similar to the EM algorithm. In particular, we iterate \((t=1,2, \ldots)\) the following steps until convergence.
(a) Given the missing data \(\omega_{t}\), compute the spline \(S_{t}\) by training a cubic smoothing spline on \(\left\{y_{i}, \boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}\right\}\). The smoothing coefficient of the spline may be determined as part of this step.
(b) Given the spline function \(S_{t}\), compute the next projection vector via iterative reweighted least squares:
\[ \begin{equation*} \boldsymbol{\omega}_{t+1}=\underset{\boldsymbol{\beta}}{\arg \min }\left(\boldsymbol{e}_{t}-\mathbf{X} \boldsymbol{\beta}\right)^{\top} \sum_{t}\left(\boldsymbol{e}_{t}-\mathbf{X} \boldsymbol{\beta}\right) \tag{9.11} \end{equation*} \]
where \[ e_{t, i}:=\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}+\frac{y_{i}-S_{t}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}\right)}{S^{\prime}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}\right)}, \quad i=1, \ldots, n \]
is the adjusted response, and \(\boldsymbol{\Sigma}_{t}^{1 / 2}=\operatorname{diag}\left(S^{\prime}{ }_{t}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{1}\right), \ldots, S^{\prime}{ }_{t}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{n}\right)\right)\) is a diagonal matrix.
Apply Taylor's Theorem B. 1 to the function \(S_{t}\) and derive the iterative reweighted least-squares optimization program (9.11).
Step by Step Answer:
Data Science And Machine Learning Mathematical And Statistical Methods
ISBN: 9781118710852
1st Edition
Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev