In multi-output linear regression, the response variable is a real-valued vector of dimension, say, (m). Similar to
Question:
In multi-output linear regression, the response variable is a real-valued vector of dimension, say, \(m\). Similar to (5.8), the model can be written in matrix notation:
\[ \mathbf{Y}=\mathbf{X B}+\begin{gathered} \boldsymbol{\varepsilon}_{1}^{\top} \\ \vdots \\ \boldsymbol{\varepsilon}_{n}^{\top} \end{gathered} \]
where:
- \(\mathbf{Y}\) is an \(n \times m\) matrix of \(n\) independent responses (stored as row vectors of length \(m\) );
- \(\mathbf{X}\) is the usual \(n \times p\) model matrix;
- \(\mathbf{B}\) is an \(p \times m\) matrix of model parameters;
- \(\varepsilon_{1}, \ldots, \varepsilon_{n} \in \mathbb{R}^{m}\) are independent error terms with \(\mathbb{E} \varepsilon:=\mathbf{0}\) and \(\mathbb{E} \varepsilon \varepsilon^{\top}:=\sum\).
We wish to learn the matrix parameters \(\mathbf{B}\) and \(\boldsymbol{\Sigma}\) from the training set \(\{\mathbf{Y}, \mathbf{X}\}\). To this end, consider minimizing the training loss:
\[ \frac{1}{n} \operatorname{tr}\left((\mathbf{Y}-\mathbf{X B}) \Sigma^{-1}(\mathbf{Y}-\mathbf{X B})^{\top}\right) \]
where \(\operatorname{tr}(\cdot)\) is the trace of a matrix.
(a) Show that the minimizer of the training loss, denoted \(\widehat{\mathbf{B}}\), satisfies the normal equations:
\[ \mathbf{X}^{\top} \mathbf{X} \widehat{\mathbf{B}}=\mathbf{X}^{\top} \mathbf{Y} \]
(b) Noting that \[ (\mathbf{Y}-\mathbf{X B})^{\top}(\mathbf{Y}-\mathbf{X B})=\sum_{i=1}^{n}=\boldsymbol{\varepsilon}_{i} \boldsymbol{\varepsilon}_{i}^{\top} \]
explain why \[ \widehat{\sum=} \frac{(\mathbf{y}-\mathbf{x} \widehat{\mathbf{B}})^{\top}(\mathbf{y}-\mathbf{x} \widehat{\mathbf{B}})}{n} \]
is a method-of-moments estimator of \(\boldsymbol{\Sigma}\), just like the one given in (5.10).
Step by Step Answer:
Data Science And Machine Learning Mathematical And Statistical Methods
ISBN: 9781118710852
1st Edition
Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev