15.6 Random projection, PCA, and nearest neighbors. (a) Download the MNIST test set of handwritten digits at:
Question:
15.6 Random projection, PCA, and nearest neighbors.
(a) Download the MNIST test set of handwritten digits at:
http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz.
Create a data matrix X 2 RNm from the rst m = 2;000 instances of this dataset (the dimension of each instance should be N = 784).
(b) Find the ten nearest neighbors for each point in X, that is, compute Ni;10 for 1 i m, where Ni;t denotes the set of the t nearest neighbors for the ith datapoint and nearest neighbors are dened with respect to the L2 norm.
Also compute Ni;50 for all i.
(c) Generate ~X = AX, where A 2 RkN, k = 100 and entries of A are sampled independently from the standard normal distribution. Find the ten nearest neighbors for each point in ~X, that is, compute ~ Ni;10 for 1 i m.
(d) Report the quality of approximation by computing score10 = 1 m Pm i=1 jNi;10\ ~ Ni;10j. Similarly, compute score50 = 1 m Pm i=1 jNi;50 \ ~Ni;10j.
(e) Generate two plots that show score10 and score50 as functions of k (i.e., perform steps
(c) and
(d) for k = f1; 10; 50; 100; 250; 500g). Provide a oneor two-sentence explanation of these plots.
(f) Generate similar plots as in
(e) using PCA (with various values of k) to generate ~X and subsequently compute nearest neighbors. Are the nearest neighbor approximations generated via PCA better or worse than those generated via random projections? Explain why.
Step by Step Answer:
Foundations Of Machine Learning
ISBN: 9780262351362
2nd Edition
Authors: Mehryar Mohri, Afshin Rostamizadeh