For the (k)-means algorithm, it is interesting to note that by choosing the initial cluster centers carefully,
Question:
For the \(k\)-means algorithm, it is interesting to note that by choosing the initial cluster centers carefully, we may be able to not only speed up the algorithm's convergence, but also guarantee the quality of the final clustering. The \(\boldsymbol{k}\)-means++ algorithm is a variant of \(k\)-means, which chooses the initial centers as follows. First, it selects one center uniformly at random from the objects in the data set. Iteratively, for each object \(\boldsymbol{p}\) other than the chosen center, it chooses an object as the new center. This object is chosen at random with probability proportional to \(\operatorname{dist}(\boldsymbol{p})^{2}\), where \(\operatorname{dist}(\boldsymbol{p})\) is the distance from \(\boldsymbol{p}\) to the closest center that has already been chosen. The iteration continues until \(k\) centers are selected.
Explain why this method will not only speed up the convergence of the \(k\)-means algorithm, but also guarantee the quality of the final clustering results.
Step by Step Answer:
Data Mining Concepts And Techniques
ISBN: 9780128117613
4th Edition
Authors: Jiawei Han, Jian Pei, Hanghang Tong