Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Handwritten digit recognition using a Gaussian generative model. In class, we mentioned the MNIST data set of handwritten digits. You can obtain it from: http:/lyann.lecun
Handwritten digit recognition using a Gaussian generative model. In class, we mentioned the MNIST data set of handwritten digits. You can obtain it from: http:/lyann.lecun com/exdbi mnist/index html In this problem, you will build a classifier for this data, by modeling each class as a multivariate (784-dimensional) Gaussian (a) Upon downloading the data, you should have two training files (one with images, one with labels) and two test files. Unzip them. Load the data into MATLAB (you can use any other platform that you are familier with including Python) . (b) Split the training set into two pieces - a training set of size 50000, and a separate validation set of size 10000. Also load in the test data. (e) Now fit a Gaussian generative model to the training data of 50000 points: Determine the class probabilities: what fraction of the training points are digit 0, for instance? Call these values . . . . . .19. Fit a Gaussian to each digit, by finding the mean and the covariance of the corresponding data points. Let the Gaussian for the jth digit be P-N(. Using these two pieces of information, you can classify new images x using Bayes' rule: simply pick the digit j for which mPj(x) is largest. . (d) One last step is needed: it is important to smooth the covariance matrices, and the usual way to do this is to add in cl, where c is some constant and I is the identity matrix. What value of c is right? Use the validation set to help you choose. That is, choose the value of e for which the resulting classifier makes the fewest mistakes on the validation set. What value of c did you get? Handwritten digit recognition using a Gaussian generative model. In class, we mentioned the MNIST data set of handwritten digits. You can obtain it from: http:/lyann.lecun com/exdbi mnist/index html In this problem, you will build a classifier for this data, by modeling each class as a multivariate (784-dimensional) Gaussian (a) Upon downloading the data, you should have two training files (one with images, one with labels) and two test files. Unzip them. Load the data into MATLAB (you can use any other platform that you are familier with including Python) . (b) Split the training set into two pieces - a training set of size 50000, and a separate validation set of size 10000. Also load in the test data. (e) Now fit a Gaussian generative model to the training data of 50000 points: Determine the class probabilities: what fraction of the training points are digit 0, for instance? Call these values . . . . . .19. Fit a Gaussian to each digit, by finding the mean and the covariance of the corresponding data points. Let the Gaussian for the jth digit be P-N(. Using these two pieces of information, you can classify new images x using Bayes' rule: simply pick the digit j for which mPj(x) is largest. . (d) One last step is needed: it is important to smooth the covariance matrices, and the usual way to do this is to add in cl, where c is some constant and I is the identity matrix. What value of c is right? Use the validation set to help you choose. That is, choose the value of e for which the resulting classifier makes the fewest mistakes on the validation set. What value of c did you get
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started