Could you please answer question 3 (a)-(d)? Thanks!
QUESTION 3 In the rst part of this question we consider a dataset of I pairs \"gm-\"Ll, where each pair gives a scalar observation 2; made at time ti. A Gaussian process model [or z = [21 2;; - -- 2;]T given t = [t1 t2 MT is: plet,') = N(z; I], K+oill), with ng = k(t,-,t_,-; I9), where 11 is the identity matrix, the kernel function is Mint; 3) = 0; explri r if): and 8 : {a}, f, on} are the parameters of the model. The posterior predictive distribution for an output at time it given data and model is: 19(2. litazatan : Ntzki 171,52), where we can compute m and 32 for any test time, training data, and parameters. (:1) Consider the posterior predictive distribution given above1 evaluated with the test time set to the time of the rst observed pair, t,=t1. Write down, with a brief reason, both the predictive mean and variance, m and 52, i. in the limit a: > 0, ii. in the limit of: a 00. No derivations are required. (b) Hm; is the posterior mean at the 15th observed training time, t, 2th we can write down a square error as: I E; = zone 7 7.1:}2- Briey explain why this error is not a good cost function for optimizing on. (c) What is a suitable cost function that we could minimize in a standard gradientbased optimizer to set the parameters 8? (d) State another method that could be used to predict the output 2, at a new time t,' and give an advantage and disadvantage compared to the Gaussian process approach. in the remainder of this question we consider N labelled time series. The nth time series has In observations 3d") :{(tsn), z5"3)}:;1_ Each time series has a single binary label like {0, 1}. The time series are not all the same length; the mnnber of observations In depends on the particular instance n. For a. new unlabelled sequence x=(t, z} we want to assign a. probability to its label, 3;. (e) Why is it not possible to t a straightforward logistic regression model to address the classication task described above? One way to model the training data is to assume that each sequence came from a Gimmian process as described in the rst part. A parameter vector 9(0) can be tted to minimize a suitable cost function, as in (c), summed over the sequences where yn=0. A parameter vector 9(1) can be tted to the sequences with y..=1. {1'} Given the classspecic models with parameters 9'\") and 5'\") described above, we can build a. Bayes classier. The classier has parameters a, and predicts the label for a new sequence with: Pullman) CK Pollen)- We assume that the times in the test sequence t are known and that the classspecilic models only mode] the test observations 2 given the times. i. What parameter[s) need to be included in a in addition to 3'") and 3(1), and how can it / they be set? ii. Describe how to calculate P[y=1 |z, t, a), giving enough detail so that it is clear how to calculate each term in your cquation(s). (g) When the training data are inspected, it appears that the main difference between the two classless is that they have dierent trends. In class zero the observations tend to increase after time zero, and in class one the observations tend to decrease after time zero. i, Briey explain why a Bayes classier with the Gaussian process model described above cannot capture this difference between the classes. Refer to the differences that can be captured by the model's parameters in your explanation, ii. Briey describe another classier that could be applied to these time series, and could capture trends