Question

1 Approved Answer

Posted on Sep 21, 2024

I need to implement an algorithm to solve digit classification in an unsupervised manner, using K-means clustering. I already wrote a large part of the

I need to implement an algorithm to solve digit classification in an unsupervised manner, using K-means clustering. I already wrote a large part of the code but I am having a lot trouble/errors getting it to work properly. Please see attached for more details. I can also provide my .m file, as well as the data files so you can help me with it. It is fine if you want to do it in another way, however. Please help! Thanks!

image text in transcribed

3. (Computer) In this problem, we try to solve digit classification in an unsupervised manner, using K-means clustering. We assume that we only have the training images, without labels, and that there are 10 digit classes (as before). Hence, there are 10 clusters to learn. Each cluster has the same prior probability and gaussian distribution with identity covariances. Implement a K-means algorithm to learn the means of these clusters. A good stopping rule can be when the assignments of points to clusters do not change much in an iteration, say 0.2% (10 changes for a set of 5000 images). 1. First consider 10 random initializations suitably scaled to match the image intensity range. Run a K-means algorithm using these random initializations. Is there any problem that you encounter while running the algorithm? If yes, what is it and how can you tackle it (you need not implement the part of how to tackle it)? If not, submit the final class means as 28 28 images 2. Now instead of choosing random initializations for the class means, choose 10 random images from the training data itself and assume it to be the initial class means. Run the K-means algorithm and display the final class means as grayscale images. Also submit the image number of the random image chosen for initialization 3. Manually assign labels (0,1,2 9) to the class means obtained in part 2. It is possible that some of the digit labels do not have a representation in the means obtained above, ignore those labels Also some labels will have more than one representation, choose the one you feel the best. Now using these means perform a classification using gaussian classifier of HW3 on the test data. As before compute and display the error rates per class and total error rates. (For the digits that do not have a representation, consider the error rate to be the 50%) 4. Repeat part 2 for another set of random images. Are these means different from the ones you obtained above? What can you say about the sensitivity to the initialization from the above experiment. 3. (Computer) In this problem, we try to solve digit classification in an unsupervised manner, using K-means clustering. We assume that we only have the training images, without labels, and that there are 10 digit classes (as before). Hence, there are 10 clusters to learn. Each cluster has the same prior probability and gaussian distribution with identity covariances. Implement a K-means algorithm to learn the means of these clusters. A good stopping rule can be when the assignments of points to clusters do not change much in an iteration, say 0.2% (10 changes for a set of 5000 images). 1. First consider 10 random initializations suitably scaled to match the image intensity range. Run a K-means algorithm using these random initializations. Is there any problem that you encounter while running the algorithm? If yes, what is it and how can you tackle it (you need not implement the part of how to tackle it)? If not, submit the final class means as 28 28 images 2. Now instead of choosing random initializations for the class means, choose 10 random images from the training data itself and assume it to be the initial class means. Run the K-means algorithm and display the final class means as grayscale images. Also submit the image number of the random image chosen for initialization 3. Manually assign labels (0,1,2 9) to the class means obtained in part 2. It is possible that some of the digit labels do not have a representation in the means obtained above, ignore those labels Also some labels will have more than one representation, choose the one you feel the best. Now using these means perform a classification using gaussian classifier of HW3 on the test data. As before compute and display the error rates per class and total error rates. (For the digits that do not have a representation, consider the error rate to be the 50%) 4. Repeat part 2 for another set of random images. Are these means different from the ones you obtained above? What can you say about the sensitivity to the initialization from the above experiment