Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In practice, PXY is usually unknown and we use the empirical risk minimizer (ERM). We will reformulate the problem as a d-dimensional linear regression problem.
In practice, PXY is usually unknown and we use the empirical risk minimizer (ERM). We will reformulate the problem as a d-dimensional linear regression problem. First note that functions in Hd are parametrized by a vector b=[b0,b1,bd], we will use the notation fb. Similarly we will note aR3 the vector parametrizing g(x)=fa(x). We will also gather data points from the training sample in the following matrix and vector: X=111x1x2xNx1dx2dxN,y=[y0,y1,yN] These notations allow us to take advantage of the very effective linear algebra formalism. X is called the design matrix. 5. (2 Points) Show that the empirical risk minimizer (ERM) b^ is given by the following minimization b^=bargminXby22. 6. (3 Points) If N>d and X is full rank, show that b^=(XX)1Xy. (Hint: you should take the gradients of the loss above with respect to b1). Why do we need to use the conditions N>d and X full rank
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started