[Solved] CSC 411 / CSC 2515 Introduction to Machin

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 13, 2024

CSC 411 / CSC 2515 Introduction to Machine Learning ASSIGNMENT # 1 Due at NOON on: Oct. 19 (CSC 411) / Oct. 20 (CSC 2515)

CSC 411 / CSC 2515 Introduction to Machine Learning ASSIGNMENT # 1 Due at NOON on: Oct. 19 (CSC 411) / Oct. 20 (CSC 2515) 1 Logistic Regression (40 points) 1.1 (10 points) Bayes' Rule Suppose you have a D-dimensional data vector x = (x1 , . . . , xD )T and an associated class variable y {0, 1} which is Bernoulli with parameter (i.e. p(y = 1) = and p(y = 0) = 1). Assume that the dimensions of x are conditionally independent given y, and that the conditional likelihood of each xi is Gaussian with i0 and i1 as the means of the two classes and i as their shared standard deviation. Use Bayes' rule to show that p(y = 1|x) takes the form of a logistic function: 1 p(y = 1|x) = (wT x + b) = 1 + exp D i=1 wi xi b Derive expressions for the weights w = (w1 , . . . , wD )T and the bias b in terms of the parameters of the class likelihoods and priors (i.e., i0 , i1 , i and ). 1.2 (15 points) Maximum Likelihood Estimation Now suppose you are given a training set D = {(x(1) , y (1) ), . . . , (x(N ) , y (N ) )}. Consider a binary logistic regression classier of the same form as before: 1 p(y = 1|x(n) , w, b) = (wT x(n) + b) = 1 + exp (n) D i=1 wi xi b Derive an expression for E(w, b), the negative log-likelihood of y (1) , . . . , y (N ) given x(1) , . . . , x(N ) and the model parameters, under the i.i.d. assumption. Then derive expressions for the derivatives of E with respect to each of the model parameters. 1.3 (15 points) L2 Regularization Now assume that a Gaussian prior is placed on each element of w, and b such that p(wi ) = N (wi |0, 1/) and p(b) = N (b|0, 1/). Derive an expression that is proportional to p(w, b|D), the posterior distribution of w and b, based on this prior and the likelihood dened above. The expression you derive must contain all terms that depend on w and b. Dene L(w, b) to be the negative logarithm of the expression you derive. Show that L(w, b) takes the following form: L(w, b) = E(w, b) + 2 D 2 wi + i=1 2 b + C() 2 where C() is a term that depends on but not on either w or b. What are the derivatives of L with respect to each of the model parameters? 2 Digit Classication (60 points) In this section, you will compare the performance and characteristics of different classiers, namely k-nearest neighbours, logistic regression, and naive Bayes. You will extend the provided code and experiment with these extensions. Note that you should understand the code rst instead of using it as a black box. Both Matlab and Python1 versions of the code have been provided. You are free to work with whichever you wish. The data you will be working with are hand-written digits, 4s and 9s, represented as 28x28 pixel arrays. There are two training sets: mnist train, which contains 80 examples of each class, and mnist train small, which contains 5 examples of each class. There is also a validation set mnist valid that you should use for model selection, and a test set mnist test. Code for visualizing the datasets has been included in plot digits. 2.1 (10 points) k-Nearest Neighbours Use the supplied kNN implementation to predict labels for mnist valid, using mnist train as the training set. Write a script that runs kNN for different values of k {1, 3, 5, 7, 9} and plots the classication rate on the validation set (number of correctly predicted cases, divided by total number of data points) as a function of k. Comment on the performance of the classier and argue which value of k you would choose. What is the classication rate for k , your chosen value of k? Also compute the rate for k + 2 and k 2. Does the test performance for these values of k correspond to the validation performance?2 Why or why not? 2.2 (10 points) Logistic regression 1 If you choose to work with Python, you should use Python 2.7 with both the Numpy and Matplotlib packages installed. 2 In general you shouldn't peek at the test set multiple times, but for the purposes of this question it can be an illustrative exercise. Look through the code in logistic regression template and logistic. Complete the implementation of logistic regression by providing the missing part of logistic. Use checkgrad to make sure that your gradients are correct. Run the code on both mnist train and mnist train small. You will need to experiment with the hyperparameters for the learning rate, the number of iterations (if you have a smaller learning rate, your model will take longer to converge), and the way in which you initialize the weights. If you get Nan/Inf errors, you may try to reduce your learning rate or initialize with smaller weights. Report which hyperparameter settings you found worked the best and the nal cross entropy and classication error on the training, validation and test sets. Note that you should only compute the test error once you have selected your best hyperparameter settings using the validation set. Next look at how the cross entropy changes as training progresses. Submit 2 plots, one for each of mnist train and mnist train small. In each plot show two curves: one for the training set and one for the validation set. Run your code several times and observe if the results change. If they do, how would you choose the best parameter settings? 2.3 (10 points) Penalized logistic regression Implement the penalized logistic regression model you derived in 1.3 by modifying logistic to include a regularizer. Call the new function logistic pen. You should only penalize the weights and not the bias term, as it only controls the height of the function but not its complexity. Note that you can omit the C() term in your error computation, since its derivative is 0 w.r.t. the weights and bias. Use checkgrad to verify the gradients of your new logistic pen function. Repeat part 2.2, but now with different values of the penalty parameter . Try {0.001, 0.01, 0.1, 1.0}. At this stage you should not be evaluating on the test set as you will do so once you have chosen your best . To do the comparison systematically, you should write a script that includes a loop to evaluate different values of automatically. You should also re-run logistic regression at least 10 times for each value of . So you will need two nested loops: The outer loop is over values of . The inner loop is over multiple re-runs. Average the evaluation metrics (cross entropy and classication error) over the different re-runs. In the end, plot the average cross entropy and classication error against . So for each of mnist train and mnist train small you will have 2 plots: one plot for cross entropy and another plot for classication error. Each plot will have two curves: one for training and one for validation. How do the cross entropy and classication error change when you increase ? Do they go up, down, rst up and then down, or down and then up? Explain why you think they behave this way. Which is the best value of , based on your experiments? Report the test error for the best value of . Compare the results with and without penalty. Which one performed better for which data set? Why do you think this is the case? 2.4 (15 points) Naive Bayes In this question you will experiment with a binary naive Bayes classier. In a naive Bayes classier, the conditional distribution for example x Rd to take on class c (out of K different classes) is dened by p(x|c)p(c) p(c|x) = K k=1 p(x|k)p(k) where p(x|c) = d p(xi |c) according to the naive Bayes assumption. In this question, we model i=1 p(xi |c) as a Gaussian for each i as 2 p(xi |c) = N (xi |ic , ic ) = 1 2 2ic exp (xi ic )2 2 2ic 2 2 2 The prior distribution p(c) and parameters c = (1c , ..., dc ) , c = (1c , ..., dc ) learned on a training set using maximum likelihood estimation. for all c are Code for training this binary naive Bayes classier is included. The main components are: MATLAB train nb.m: trains a naive Bayes classier given some data. test nb.m: tests a trained naive Bayes classier on some test digits. Python nb.py: includes code to train and test naive Bayes classiers. You are required to ll in run nb.m in MATLAB or the main method of nb.py in Python to complete the pipeline of training, testing a naive Bayes classier and visualize learned models. The code you need to ll in should be less than 10 lines. Report the training and test accuracy using the naive Bayes model, and show the visualization of 2 the mean and variance vectors c and c for both classes. Briey comment on the visualization results. 2.5 (15 points) Compare k-NN, Logistic Regression, and Naive Bayes Compare the results of k-NN on the digit classication task with those you got using logistic regression and naive Bayes. Briey comment on the differences between these classiers. Write up Hand in answers to all the questions in the parts above. The goal of your write-up is to document the experiments you've done and your main ndings. So be sure to explain the results. The answers to your questions should be in pdf form and turned in along with your code. Package your code and a copy of the write-up pdf document into a zip or tar.gz le called A1-*your-student-id*.[zip|tar.gz]. Only include functions and scripts that you modied. Submit this le on MarkUs. Do not turn in a hard copy of the write-up