Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In projection-based clustering, we note that, if the data is well separated into clusters with means ,...,K, then the top K eigenvectors of the
In projection-based clustering, we note that, if the data is well separated into clusters with means ,...,K, then the top K eigenvectors of the data covariance matrix, say (v,...,UK), tend to align with the Span(,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(X, X) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v,..., vk] RdxK, the matrix of top principal directions ||v|| = 1. Consider two possible transformations of the data: (i) X VTX, and (ii) X VVTX Would the clusters learnt from the data ; differ from those learnt from the data ? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime?
Step by Step Solution
★★★★★
3.48 Rating (155 Votes )
There are 3 Steps involved in it
Step: 1
Youre right that PCA makes the implicit assumption that the Euclidean distance is the appropriate me...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started