A (d)-dimensional normal random vector (X sim mathscr{N}left(boldsymbol{mu}, sum ight)) can be defined via an affine transformation,
Question:
A \(d\)-dimensional normal random vector \(X \sim \mathscr{N}\left(\boldsymbol{\mu}, \sum\right)\) can be defined via an affine transformation, \(\boldsymbol{X}=\boldsymbol{\mu}+\boldsymbol{\Sigma}^{1 / 2} \mathbf{Z}\), of a standard normal random vector \(\boldsymbol{Z} \sim \mathcal{N}\left(\mathbf{0}, I_{d}\right)\), where \(\Sigma^{1 / 2}\left(\Sigma^{1 / 2}\right)^{\top}=\Sigma\). In a similar way, we can define a \(d\)-dimensional Student random vector \(X \sim\) \(t_{\alpha}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) via a transformation
\[ \begin{equation*} \boldsymbol{X}=\boldsymbol{\mu}+\frac{1}{\sqrt{S}} \boldsymbol{\Sigma}^{1 / 2} \boldsymbol{Z} \tag{4.46} \end{equation*} \]
where, \(\boldsymbol{Z} \sim \mathscr{N}\left(\mathbf{0}, I_{d}\right)\) and \(S \sim \operatorname{Gamma}\left(\frac{\alpha}{2}, \frac{\alpha}{2}\right)\) are independent, \(\alpha>0\), and \(\boldsymbol{\Sigma}^{1 / 2}\left(\boldsymbol{\Sigma}^{1 / 2}\right)^{\top}=\Sigma\). Note that we obtain the multivariate normal distribution as a limiting case for \(\alpha \rightarrow \infty\).
(a) Show that the density of the \(\mathrm{t}_{\alpha}\left(\mathbf{0}, \mathbf{I}_{d}\right)\) distribution is given by \[ t_{\alpha}(\boldsymbol{x}):=\frac{\Gamma((\alpha+d) / 2)}{(\pi \alpha)^{/ / 2} \Gamma(\alpha / 2)}\left(1+\frac{1}{\alpha}\|\boldsymbol{x}\|^{2}\right)^{-\frac{\alpha+d}{2}} \]
By the transformation rule (C.23), it follows that the density of \(\boldsymbol{X} \sim \mathrm{t}_{\alpha}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) is given by \(\mathrm{t}_{\alpha, \Sigma}(\boldsymbol{x}-\boldsymbol{\mu})\), where \[ t_{\alpha, \boldsymbol{\Sigma}}(\boldsymbol{x}):=\frac{1}{\left|\boldsymbol{\Sigma}^{1 / 2}\right|} t_{\alpha}\left(\boldsymbol{\Sigma}^{1 / 2} \boldsymbol{x}\right) \]
[Hint: conditional on \(S=s, \boldsymbol{X}\) has a \(\mathscr{N}\left(\mathbf{0}, \mathbf{I}_{d} / s\right)\) distribution.]
(b) We wish to fit a \(\mathbf{t}_{v}(\boldsymbol{\mu}, \Sigma)\) distribution to given data \(\tau=\left\{\boldsymbol{x}_{1}, \ldots, \boldsymbol{x}_{n}\right\}\) in \(\mathbb{R}^{d}\) via the EM method. We use the representation (4.46) and augment the data with the vector \(\boldsymbol{S}=\left[S_{1}, \ldots\right.\), \(\left.S_{n}\right]^{\top}\) of hidden variables. Show that the complete-data likelihood is given by \[ \begin{equation*} g(\tau, \boldsymbol{s} \mid \boldsymbol{\theta})=\prod_{i} \frac{(\alpha / 2)^{\alpha / 2} s_{i}^{(\alpha+d) / 2-1} \exp \left(-\frac{s_{i}}{2} \alpha-\frac{s_{i}}{2} \boldsymbol{\Sigma}^{1 / 2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}\right)^{2}\right)}{\Gamma(\alpha / 2)(2 \pi)^{d / 2}\left|\boldsymbol{\Sigma}^{1 / 2}\right|} \tag{4.47} \end{equation*} \]
(c) Show that, as a consequence, conditional on the data \(\tau\) and parameter \(\boldsymbol{\theta}\), the hidden data are mutually independent, and \[ \left(S_{i} \mid \tau, \boldsymbol{\theta}\right) \sim \operatorname{Gamma}\left(\frac{\alpha+d}{2}, \frac{\alpha+\boldsymbol{\Sigma}^{1 / 2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}\right)^{2}}{2}\right), \quad i=1, \ldots, n \]
(d) At iteration \(t\) of the EM algorithm, let \(g^{(t)}(\boldsymbol{s})=g\left(\boldsymbol{s} \mid \tau, \boldsymbol{\theta}^{(t-1)}\right)\) be the density of the missing data, given the observed data \(\tau\) and the current parameter guess \(\boldsymbol{\theta}^{(t-1)}\). Verify that the expected complete-data log-likelihood is given by:
\[ \begin{aligned} \mathbb{E}_{g^{(t)}} \ln g(\tau, \mathbf{S} \mid \boldsymbol{\theta}) & =\frac{n \alpha}{2} \ln \frac{\alpha}{2}-\frac{n d}{2} \ln (2 \pi)-n \ln \Gamma\left(\frac{\alpha}{2}\right)-\frac{n}{2} \ln |\mathbf{\Sigma}| \\ & +\frac{\alpha+d-2}{2} \sum_{i=1}^{n} \mathbb{E}_{g^{(t)}} \ln S_{i}-\sum_{i=1}^{n} \frac{\alpha+\boldsymbol{\Sigma}^{-1 / 2}\left(\mathbf{x}_{i}-\boldsymbol{\mu}\right)^{2}}{2} \mathbb{E}_{g^{(t)}} S_{i} \end{aligned} \]
Show that
\[ \begin{aligned} & \mathbb{E}_{g^{(t)}} S_{i}=\frac{\alpha^{(t-1)}+d}{\alpha^{(t-1)+\left(\boldsymbol{\Sigma}^{(t-1)}\right)^{-1 / 2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}^{(t-1)}\right)^{2}}}=: w_{i}^{(t-1)} \\ & \mathbb{E}_{g^{(t)}} \ln S_{i}=\psi\left(\frac{\alpha^{(t-1)}+d}{2}\right)-\ln \left(\frac{\alpha^{(t-1)}+d}{2}\right)+\ln w_{i}^{(t-1)} \end{aligned} \]
where \(\psi:=(\ln \Gamma)^{\prime}\) is digamma function.
(e) Finally, show that in the M-step of the EM algorithm \(\boldsymbol{\theta}^{(t)}\) is updated from \(\boldsymbol{\theta}^{(t-1)}\) as follows:
\[ \begin{aligned} & \boldsymbol{\mu}^{(t)}=\frac{\sum_{i=1}^{n} w_{i}^{(t-1)} \boldsymbol{x}_{i}}{\sum_{i=1}^{n} w_{i}^{(t-1)}} \\ & \boldsymbol{\Sigma}^{(t)}=\frac{1}{n} \sum_{i=1}^{n} w_{i}^{(t-1)}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}^{(t)}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}^{(t)}\right)^{\top} \end{aligned} \]
and \(\alpha^{(t)}\) is defined implicitly through the solution of the nonlinear equation:
\[ \ln \left(\frac{\alpha}{2}\right)-\psi\left(\frac{\alpha}{2}\right)+\psi\left(\frac{\alpha^{(t)}+d}{2}\right)-\ln \left(\frac{\alpha^{(t)}+d}{2}\right)+1+\frac{\sum_{i=1}^{n}\left(\ln \left(w_{i}^{(t-1)}\right)-w_{i}^{(t-1)}\right)}{n}=0 \]
Step by Step Answer:
Data Science And Machine Learning Mathematical And Statistical Methods
ISBN: 9781118710852
1st Edition
Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev