Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Challenge Problems C1. Johnson-Lindenstrauss for Clustering As discussed in class, one of the most popular clustering objectives is k-means clustering. Given a set of

 

Challenge Problems C1. Johnson-Lindenstrauss for Clustering As discussed in class, one of the most popular clustering objectives is k-means clustering. Given a set of n data points X {x, ..., xn} in Rd, the goal is to partition [n] into k sets (clusters) C = {C,... Ck} minimizing: = k cost (C, X) = ||xi - Mj||/2 j=1 iCj Liec, xi is the centroid of cluster C; (i.e., the mean of the points in that cluster.) where = 1. Prove the fact discussed in class that cost (C) can be equivalently written as: k 1 cost(C, X) = 2|C| j=1 ||xia - Xill3. in EC'j iz ECj (1) Hint: Show that both (1) and (2) can be rewritten as =1 [(Ciec; ||xi||2) |C;| ||;||2]. k 2. Suppose that II Rmxd is a random projection matrix with each entry chosen independently as N(0, 1/m). For each x; in the dataset, let x = IIx, and let X = {x,...,xn} denote our set of sketched data points in Rm. Conclude from part (1) that if m= 0 (log(n/5)), then with probability 1-6, for every possible clustering C, (1 ) cost(C, X) cost(C, X) (1 + ) cost(C, X). Hint: To keep your calculations simple, you can use the version of the JL Lemma which says that all squared norms are preserved. I.e., that for all x, x; in our input set, (1-)||xi-xj|| |||IIx; IIx;|| (1 + )||x; xj||2. This is what we actually proved in class and directly implies the version with unsquared norms by taking a square root. 3. Assuming that the guarantee of part (2) holds for some < 1/2, prove that if C is an optimal clustering for the compressed dataset X, i.e., cost(, X) = minc cost(C, X), then: cost (, X) (1+ 4) min cost (C, X). That is, is a near optimal clustering for the original dataset X. Hint: You will have to apply the guarantee of part (2) to two different clusterings in the course of the proof. You may want to recall from Problem Set 1 that for any x (0, 1/2), 1+ 2x.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

International Marketing And Export Management

Authors: Gerald Albaum , Alexander Josiassen , Edwin Duerr

8th Edition

1292016922, 978-1292016924

More Books

Students also viewed these Databases questions

Question

Factor completely. p(p + 2) + p(p + 2) - 6(p + 2)

Answered: 1 week ago

Question

What is df for a Pearson r if N = 212?

Answered: 1 week ago