Answered step by step
Verified Expert Solution
Question
1 Approved Answer
SoftSVM optimization In this question you will implement the SGD optimization method for SoftSVM to find a linear with minimal empirical loss. The first dataset
SoftSVM optimization In this question you will implement the SGD optimization method for SoftSVM to find a linear with minimal empirical loss. The first dataset is stored in the file, bg.txt The file contains only the feature vectors. There is one feature vector per line. The first 100 lines correspond to positive instances (label +1) and the next 100 lines are negative instances (label -1). The data can be loaded into a matrix with the load function in Matlab/Octave. (a) Implement SGD for SoftSVM. You may skip the averaging over weight vectors for the output and instead, simply output the last iterate. During the optimization, keep track of both the empirical and the hinge loss of the current weight vector Include a printout of your code. (2 marks) b) Run the optimization method with various values for the regularization parameter A100, 10,11,.01,.001) on the data (remember to first add an additional feature with value 1 to each datapoint, so that you are actually training a general linear classifier). For each value plot the binary loss of the iterates and the hinge loss of the iterates (in separate plots). Include three plots of your choice where you observe distinct behavior (you may run the method several times for each parameter setting and choose) 4 marks) (c) Discuss the plots. Are the curves monotone? Are they approximately monotone? Why or why not? How does the choice of A affect the optimization? How would you go about finding a linear predictor of minimal binary loss? (4 marks) (d) Download the "seeds" data set from the UCI repository: https://archive.ics.uci.edu/ml/datasets/seeds That data is also stored in text file and can be loaded the same way. It contains 210 instances, with three different label (the last column in the file corresponds to the label) (e) Train three binary linear predictors. w1 should separate the first class from the other two (ie the first 70 instances are labeled +1 and the next 140 instances-1) w2 should separate the second class from the other two and ws should separate the third class from the first two classes ie for training w label the middle 70 instances positive and the rest negative and analogously for wg). Report the binary loss that you achieve with wi, w2 and w3 for each of these tasks (f) Turn the three linear separators into a multi-class predictor for the three different classes in the seeds dataset using the following rule: y(x) = argmaXe(1,2,3) (wi, x) SoftSVM optimization In this question you will implement the SGD optimization method for SoftSVM to find a linear with minimal empirical loss. The first dataset is stored in the file, bg.txt The file contains only the feature vectors. There is one feature vector per line. The first 100 lines correspond to positive instances (label +1) and the next 100 lines are negative instances (label -1). The data can be loaded into a matrix with the load function in Matlab/Octave. (a) Implement SGD for SoftSVM. You may skip the averaging over weight vectors for the output and instead, simply output the last iterate. During the optimization, keep track of both the empirical and the hinge loss of the current weight vector Include a printout of your code. (2 marks) b) Run the optimization method with various values for the regularization parameter A100, 10,11,.01,.001) on the data (remember to first add an additional feature with value 1 to each datapoint, so that you are actually training a general linear classifier). For each value plot the binary loss of the iterates and the hinge loss of the iterates (in separate plots). Include three plots of your choice where you observe distinct behavior (you may run the method several times for each parameter setting and choose) 4 marks) (c) Discuss the plots. Are the curves monotone? Are they approximately monotone? Why or why not? How does the choice of A affect the optimization? How would you go about finding a linear predictor of minimal binary loss? (4 marks) (d) Download the "seeds" data set from the UCI repository: https://archive.ics.uci.edu/ml/datasets/seeds That data is also stored in text file and can be loaded the same way. It contains 210 instances, with three different label (the last column in the file corresponds to the label) (e) Train three binary linear predictors. w1 should separate the first class from the other two (ie the first 70 instances are labeled +1 and the next 140 instances-1) w2 should separate the second class from the other two and ws should separate the third class from the first two classes ie for training w label the middle 70 instances positive and the rest negative and analogously for wg). Report the binary loss that you achieve with wi, w2 and w3 for each of these tasks (f) Turn the three linear separators into a multi-class predictor for the three different classes in the seeds dataset using the following rule: y(x) = argmaXe(1,2,3) (wi, x)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started