It is about linear regression.
Let A E RNXN, B E RXD be symmetric, positive definite matrices. From the lectures, we can use symmetric positive definite matrices to define a corresponding inner product, as shown below. From the previous question, we can also define a norm using the inner products. (x, y) A : = x Ay | XIA := (x, x) A (x, y)B := x By | x3 := ( x, x) B Suppose we are performing linear regression, with a training set { (x1, y1), .... (xN, yN)}, where for each i, x; ( R" and y E R. We can define the matrix X = [X1. ....XN]T E RNXD and the vector y = [y1, . .. . UN]TERN. We would like to find O E R", ce R~ such that y ~ X0 + c, where the error is measured using | . A. We avoid overfitting by adding a weighted regularization term, measured using || | |B. We define the loss function with regularizer: LA, By,X(0,c) = lly - xe - clla + 1ellB + Iclla For the sake of brevity we write C(0, c) for CA, By, x (0, c). For this question:. You may use (without proof) the property that a symmetric positive definite matrix is invertible. . We assume that there are sufficiently many non-redundant data points for X to be full rank. In particular, you may assume that the null space of X is trivial (that is, the only solution to Xz = 0 is the trivial solution, z = 0.) 1. Find the gradient Voc(0, c).5.5 Useful Identities for Computing Gradients . Some useful gradients that are frequently required in machine learning . tr(.): trace det(.): determinant f(X)-1: the inverse of f (X) OxTa = aT daT x = aT daTXb = abT ax OxTBx = XT(B + BT) a as (x - As) TW(x - As) = -2(x - As)TWA for symmetric W You should be able to calculate these gradients1410:30 9A198AB 100% TODD + Y X E4 1. [ (O. C) = lly- XO-C/A + 101/B + 1/CliA = ( 4-XD -c) A cy-xo-c) + 9BB + CAC set (y-XO -c) = M then Cy-XO-CJTA ( yx9-c) = MAM dMAM JM = (4-X8-9. GX ( from lecture identi OP ty ) VOLCO, c ) = JLO,4 = (4-x8-ST. (X + 9 (B+BT) PIN 2. let Vol (O,c )= D then Cy-x8-UT. C-X)+ OT (B+BT) = 0 yTC-X ) + (Ox +CJX + JCB+B) =0