Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

h(x)=Tx, for ,xRd, and we choose that minimizes the following average square loss objective function: J()=m1i=1m(h(xi)yi)2, where (x1,y1),,(xm,ym)RdR is our training data. While this formulation

image text in transcribedimage text in transcribed

h(x)=Tx, for ,xRd, and we choose that minimizes the following "average square loss" objective function: J()=m1i=1m(h(xi)yi)2, where (x1,y1),,(xm,ym)RdR is our training data. While this formulation of linear regression is very convenient, it's more standard to use a hypothesis space of affine functions: h,b(x)=Tx+b, which allows a nonzero intercept term b - sometimes called a "bias" term. The standard way to achieve this, while still maintaining the convenience of the first representation, is to add an extra dimension to x that is always a fixed value, such as 1 , and use ,xRd+1. Convince yourself that this is equivalent. We will assume this representation. 5. Let XRm(d+1) be the design matrix, where the i 'th row of X is xi. Let y= (y1,,ym)TRm1 be the response. Write the objective function J() as a matrix/vector expression, without using an explicit summation sign. 1 6. Write down an expression for the gradient of J without using an explicit summation sign. 7. Write down the expression for updating in the gradient descent algorithm for a step size

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Administration The Complete Guide To Dba Practices And Procedures

Authors: Craig S. Mullins

2nd Edition

0321822943, 978-0321822949

More Books

Students also viewed these Databases questions