In Exercise 12 above, we train the multi-logit classifier using a weight matrix (mathbf{W}) (in mathbb{R}^{3 times

Question:

In Exercise 12 above, we train the multi-logit classifier using a weight matrix \(\mathbf{W}\) \(\in \mathbb{R}^{3 \times 7}\) and bias vector \(\boldsymbol{b} \in \mathbb{R}^{3}\). Repeat the training of the multi-logit model, but this time keeping \(z_{1}\) as an arbitrary constant (say \(z_{1}=0\) ), and thus setting \(c=0\) to be a "reference" class. This has the effect of removing a node from the output layer of the network, giving a weight matrix \(\mathbf{W} \in \mathbb{R}^{2 \times 7}\) and bias vector \(\mathrm{b} \in \mathbb{R}^{2}\) of smaller dimensions than in (7.16).

Fantastic news! We've Found the answer you've been seeking!