Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 09, 2024

Please help with below questions. Introduction This script will walk you through the process of fitting a linear model using polynomial basis functions, and the

Please help with below questions.

Introduction

This script will walk you through the process of fitting a linear model using polynomial basis functions, and the selection of a hyper-parameter using a validation data set.

importnumpyasnp

importmatplotlib.pyplotasplt

#Definingdatagenerationmodel

deff(x):

return5*(x-0)*(x-0.5)*(x-1)

#Definingstdofnoise

stdNoise=0.2;

#Settingaseedforrandomnumbergenerator

rng=np.random.default_rng(seed=42)

#Generatingtraining,validationandtestdata

N=80

x_train=rng.random(N)

y_train=f(x_train)+stdNoise*rng.normal(size=N)

N=20

x_test=rng.random(N)

y_test=f(x_test)+stdNoise*rng.normal(size=N)

#Plottingdatapurelyforverification

plt.plot(x_train,y_train,'k.',x_test,y_test,'r.')

plt.xlabel('x')

plt.ylabel('y')

plt.legend({'Training','Testing'})

plt.show()

#FunctionthatcreatestheXmatrixasdefinedforfittingourmodel

defcreate_X(x,deg):

X=np.ones((len(x),deg+1))

foriinrange(1,deg+1):

X[:,i]=x**i

returnX

#Functionforpredictingtheresponse

defpredict(x,beta):

returnnp.dot(create_X(x,len(beta)-1),beta)

#Functionforfittingthemodel

deffit(x,y,deg):

returnnp.linalg.lstsq(create_X(x,deg),y,rcond=None)[0]

#FunctionforcomputingtheMSE

defrmse(y,yPred):

se=(y-yPred)**2

returnnp.sqrt(np.mean(se))

#Fittingmodel

deg=2

beta=fit(x_train,y_train,deg)

#Computingtrainingerror

y_train_pred=predict(x_train,beta)

err=rmse(y_train,y_train_pred)

print('TrainingError={:2.3}'.format(err))

#Computingtesterror

y_test_pred=predict(x_test,beta)

err=rmse(y_test,y_test_pred)

print('TestError={:2.3}'.format(err))

#Plottingfittedmodel

x=np.linspace(0,1,100)

y=predict(x,beta)

plt.plot(x,y,'b-',x_train,y_train,'ks',x_test,y_test,'rs')

plt.legend(['Prediction','TrainingPoints','TestPoints'])

plt.show()

Question 1 [20 pts]

Your first tasks is to split the data into pre-val training and validation. You should use the last 30 samples for validation and the rest of the pre-validation training set. Keep all measurements in the same order as the original training set. Make sure the variables specified below are used for this purpose.

x_preval,y_preval=[],[]

x_val,y_val=[],[]

print(len(x_val),y_val)

#YOURCODEHERE

raiseNotImplementedError()

"""Checkthatthedimensionsarecorrectandthecorrectdataisincludedineachvariable"""

assertlen(x_val)==30

assertlen(y_val)==30

assertlen(x_preval)==len(x_train)-30

assertlen(y_preval)==len(y_train)-30

assertx_val[-1]==x_train[-1]

asserty_val[-1]==y_train[-1]

assertx_preval[0]==x_train[0]

asserty_preval[0]==y_train[0]

Question 2 [40 pts]

Next, compute training and validation errors for each of the listed degrees. The training error should show a decreasing pattern. The validation error should decrease and then increase.

#Listofdegreesconsideredfortheanalysis

degList=[0,1,2,3,4,5,6,7,8,9,10]

#Initializingrangeofdegreevaluestobetestedanderrors

errTrain=np.zeros(len(degList))

errVal=np.zeros(len(degList))

#ComputingtrainingandvalidationRMSEerrorsforeachdegreevalue

#YOURCODEHERE

raiseNotImplementedError()

#Plottingresults

plt.plot(degList,errTrain,'b.-',degList,errVal,'r.-')

plt.xlabel('degree')

plt.ylabel('RMSE')

plt.legend(['Pre-ValidationTrainingError','ValidationTrainingError'])

plt.show()

"""Checkthatthecorrecttrendsandthecorrectvaluesarepresent"""

assert-np.max(np.diff(errTrain))>0###Checkingformonotonicityoftrainingerror

assert-np.min(np.diff(errVal))>0###Checkingforsomedecreasingtrendinthevalidationerror

assertnp.max(np.diff(errVal))>0###Checkingforsomeincreasingtredinthevalidationerror

assertnp.abs(min(errTrain)-0.14)<1e-2###Checkingtheminimumofthetrainingerror

assertnp.abs(min(errVal)-0.22)<1e-2###Checkingtheminimumofthetestingerror

Performance of Optimal Model

We demonstrate the performance of the model by comparing the error when training with only the pre-validation training data, and with the training and validation data after the hyper-parameter has already been selected.

Question 3 [20 pts]

Complete the code to compute the desired test errors for the models trained using the pre-validation training set and the full training set.

#Selectingoptimaldegree

degOpt=degList[np.argmin(errVal)]

print('OptimalDegree={:1}'.format(degOpt))

#Initializingvariablefortheerrorusingonlythepre-validationtrainingset

errTest_PreVal=[]

#Initializingvariablefortheerrorusingonlythefulltrainingset

errTest_FullTrain=[]

#YOURCODEHERE

raiseNotImplementedError()

#Printingresults

print('TestError[PrevalDatasetOnly]={:2.3}'.format(errTest_PreVal))

print('TestError[FullTrainingDataset]={:2.3}'.format(errTest_FullTrain))

#Plottingfittedmodel

x=np.linspace(0,1,100)

y=predict(x,beta)#Usethebetafromthefulltrainingsetforbettervisualization

plt.plot(x,y,'b-',x_train,y_train,'ks',x_test,y_test,'rs')

plt.legend(['Prediction','TrainingPoints','TestPoints'])

plt.show()

"""Checkthatthecorrectvaluesarepresent"""

asserterrTest_PreVal>errTest_FullTrain#Thefull-trainingseterrorshouldbelower

Question 4 [15 pts]

Is this performance using the full training set better than what it was observed when training the model only with the pre-val training set? Why is that the case? Please enter your response below.

YOUR ANSWER HERE

Question 5 [5 pts]

Do you always expect this to be the case? Please enter your response below.

YOUR ANSWER HERE