Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Question 4 ( k - means, 4 0 / 1 0 0 ) Using the same terminology adopted in our course, we shall refer to

Question 4(k-means, 40/100) Using the same terminology adopted in our course, we shall
refer to the "k-means" algorithm as the algorithm that initializes the centroids randomly,
followed by a "refinement phase" where the clusters are improved further. We shall refer to
"k-means++" as the algorithm that only selects the centroids, so as to provide some
theoretical guarantees. Consider the following points in the 2D Euclidean space: p1=(0,0),
p2=(0,1),p3=(0,2),p4=(2,0),p5=(3,0),p6=(4,1),p7=(5,0),p8=(7,0),p9=(8,0),p10=(8,1). Let
k=3.
a)(10/100) Run the k-means++ algorithm to select the initial centroids, assuming that 1)
p4 is selected as first centroid 2) the remaining two centroids are chosen assuming
that at each step the point with 3rd largest probability is selected by k-means++
(breaking ties arbitrarily). In other words, let q1,q2,q3,..,qn be the input points sorted
non-increasingly according to their probability of being selected at a given step of k-
means++. Then, the point q3 is going to be selected at that step. Which centroids
have been selected at the end of this initialization step?
b)(10/100) What is the probability that p5 is selected as centroid at step 3 of k-
means++?
c)(10/100) Run the refinement phase of the k-means algorithm until it terminates while
using the centroids selected in a). What are the final clusters?
d)(10/100) Consider the variant of the k-means algorithm, where 1) any given point p
can be moved from cluster C with centroid c to a cluster C' with centroid c' even if
{:d(p,c)=d(p,c'),2) we stop as soon as we obtain the same clustering in two
consecutive iterations. Show that this variant of k-means might never terminate by
providing an example with at most 5 points in the 1-dimensional Euclidean space:Question 4(k-means, 40/100) Using the same terminology adopted in our course, we shall
refer to the "k-means" algorithm as the algorithm that initializes the centroids randomly,
followed by a "refinement phase" where the clusters are improved further. We shall refer to
"k-means++" as the algorithm that only selects the centroids, so as to provide some
theoretical guarantees. Consider the following points in the 2D Euclidean space: p1=(0,0),
p2=(0,1),p3=(0,2),p4=(2,0),p5=(3,0),p6=(4,1),p7=(5,0),p8=(7,0),p9=(8,0),p10=(8,1). Let
k=3.
a)(10/100) Run the k-means++ algorithm to select the initial centroids, assuming that 1)
p4 is selected as first centroid 2) the remaining two centroids are chosen assuming
that at each step the point with 3rd largest probability is selected by k-means++
(breaking ties arbitrarily). In other words, let q1,q2,q3,..,qn be the input points sorted
non-increasingly according to their probability of being selected at a given step of k-
means++. Then, the point q3 is going to be selected at that step. Which centroids
have been selected at the end of this initialization step?
b)(10/100) What is the probability that p5 is selected as centroid at step 3 of k-
means++?
c)(10/100) Run the refinement phase of the k-means algorithm until it terminates while
using the centroids selected in a). What are the final clusters?
d)(10/100) Consider the variant of the k-means algorithm, where 1) any given point p
can be moved from cluster C with centroid c to a cluster C' with centroid c' even if
{:d(p,c)=d(p,c'),2) we stop as soon as we obtain the same clustering in two
consecutive iterations. Show that this variant of k-means might never terminate by
providing an example with at most 5 points in the 1-dimensional Euclidean space:
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions