Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max algorithm in Information Theory, aims at assigning K centroids to a set of data. The centroids are points (or vectors) in Re, for a given dimension . The set of data, represented as a cloud of points, has N points in R. Figure 2 shows a dataset {n}_1 of N = 2000 points and K = 10 centroids {ck}_ The centroids are also called means because each centroid is the mean of its neighboring points. In Figure 2, the whole dataset is grouped into a unique cluster. In this case, the K centroids serve for vector quantization, i.e. approximate the shape of the entire cloud such that the mean squared error (MSE) is minimized. The MSE is (3) where c() is the centroid of the Voronoi cell (local neighborhood or local cluster) to which I belongs. After a random initialization of the K centroids, the K-means algo- rithm proceeds in two steps: Update the Voronoi Cells. c(xn) = Ck(n) n = 1... N. (4) See (20.3) in Chap. 20 of Sir David MacKay's book. Now, for each point n, we know that it belongs to the cell (n) with centroid c(xn) = C(n). (xn) is the closest centroid to In as given by (4). Let R(k) be the number of points in cell k, we have k_ R(k) = N. k=1 = MSE = Update the Centroids. argminen - Ck || closest centroid to In, - k=1 ||n C(xn)|| N : c(en)=ck R(K) Xn sum of all points in cell k number of points in cell k' Ck = (5) See (20.5) in Chap. 20 of Sir David MacKay's book. The algorithm iterates over the two steps until the MSE stops decreasing or until reaching a maximum number of iterations. In some cases, as shown in Figure 3, the dataset is already organized in clusters. The K-means algorithm is used to find the best positions of the K centroids. Then, given a new point, you can use its distance to the centroids to infer the cluster (i.e. the Voronoi cell) this new point belongs to. In this case, the centroids serve for classification. k = 1... K. (a) The MSE versus the iteration number is shown in Figure 4 for the dataset of N = 2000 blue points of Figure 2. How many iterations are needed for the K-means algorithm to reach convergence? Hint: Just follow the plot of Figure 4. (b) The initialization of the K centroids can be done in two different methods: 1) Select K random points in the space R, 2) Select K random points among the N points of the dataset. Which method is better? Explain. (c) Assume that the dataset is organized in three clusters, before running the K-means algorithm, as in Figure 3. Running the K-means with K = 3 centroids should lead to one centroid per cluster as shown in Figure 3. What happens if we run the K-means algorithm with K = 4 centroids or more? Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max algorithm in Information Theory, aims at assigning K centroids to a set of data. The centroids are points (or vectors) in Re, for a given dimension . The set of data, represented as a cloud of points, has N points in R. Figure 2 shows a dataset {n}_1 of N = 2000 points and K = 10 centroids {ck}_ The centroids are also called means because each centroid is the mean of its neighboring points. In Figure 2, the whole dataset is grouped into a unique cluster. In this case, the K centroids serve for vector quantization, i.e. approximate the shape of the entire cloud such that the mean squared error (MSE) is minimized. The MSE is (3) where c() is the centroid of the Voronoi cell (local neighborhood or local cluster) to which I belongs. After a random initialization of the K centroids, the K-means algo- rithm proceeds in two steps: Update the Voronoi Cells. c(xn) = Ck(n) n = 1... N. (4) See (20.3) in Chap. 20 of Sir David MacKay's book. Now, for each point n, we know that it belongs to the cell (n) with centroid c(xn) = C(n). (xn) is the closest centroid to In as given by (4). Let R(k) be the number of points in cell k, we have k_ R(k) = N. k=1 = MSE = Update the Centroids. argminen - Ck || closest centroid to In, - k=1 ||n C(xn)|| N : c(en)=ck R(K) Xn sum of all points in cell k number of points in cell k' Ck = (5) See (20.5) in Chap. 20 of Sir David MacKay's book. The algorithm iterates over the two steps until the MSE stops decreasing or until reaching a maximum number of iterations. In some cases, as shown in Figure 3, the dataset is already organized in clusters. The K-means algorithm is used to find the best positions of the K centroids. Then, given a new point, you can use its distance to the centroids to infer the cluster (i.e. the Voronoi cell) this new point belongs to. In this case, the centroids serve for classification. k = 1... K. (a) The MSE versus the iteration number is shown in Figure 4 for the dataset of N = 2000 blue points of Figure 2. How many iterations are needed for the K-means algorithm to reach convergence? Hint: Just follow the plot of Figure 4. (b) The initialization of the K centroids can be done in two different methods: 1) Select K random points in the space R, 2) Select K random points among the N points of the dataset. Which method is better? Explain. (c) Assume that the dataset is organized in three clusters, before running the K-means algorithm, as in Figure 3. Running the K-means with K = 3 centroids should lead to one centroid per cluster as shown in Figure 3. What happens if we run the K-means algorithm with K = 4 centroids or more?
Expert Answer:
Answer rating: 100% (QA)
a The number of iterations needed for the Kmeans algorithm to reach convergence cannot be determined ... View the full answer
Related Book For
Introduction to Algorithms
ISBN: 978-0262033848
3rd edition
Authors: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
Posted Date:
Students also viewed these programming questions
-
When working on Assignment A2-1 Job Analysis, you will act as an employer. First, use this template to specify the organization, job title, and key responsibilities. 2.Then, review a list of Sample...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
"All Boeing airplanes are certified and delivered to the highest levels of safety consistent with industry standards. Airplanes are delivered with baseline configuration, which includes a standard...
-
How do recruitment and selection practices contribute to high performance in an organization?
-
Which will give wider-spaced fringes in a double-slit experiment: red light or violet light? (Let Figure 29.18 guide your thinking.)
-
Describe how delivering bad news impacts your credibility.
-
In a recent study involving 42 men, the mean body temperature of the men was found to be 96.8F, with a standard deviation of 0.61. Random samples of size 30 are drawn from this population, and the...
-
Complete the following MRP matrix for Item X. Determine when orders should be released and the size of thoseorders. Item: X Lot Size: Min 50 Gross Requirements Scheduled Receipts Projected on Hand...
-
For 2025, Crane Inc. computed its annual postretirement expense as $238,300. Crane's contribution to the plan during 2025 was $175,700. Prepare Crane's 2025 entry to record postretirement expense,...
-
Which series has the highest beta. BraveNewCoin Liquid Index for Bitcoin 1D BNC Trading Brave Ne Yellow Green Blue Orange
-
Adult Americans support the idea of a federally funded health care system? A popular news station asked this question to their viewers then asked them to call in their response. The station was...
-
The par value of acompany's 11% preference share is $7. Current market expectation is 13% from this preference share. What would be the intrinsic value of this preference share
-
Rationalize the denominator 7 8-6
-
Suppose Nova Scotia Power (NSP) deploys thousands of smart meters in the province to collect power usage from respective homes. Then, NSP transports those over communication networks to a repository....
-
Global Engineers hired the following number of engineers during the first six months of 2017. Assume the data represent a sample . (20 pts) Month Engineers Hired (x) January 3 February 2 March 4...
-
5 3x The rational expression is 2x x+1 equivalent to a.) -6x+5x+1 2x(x+1) b.) -6x+5x+5 2x(x+1) c.) 5+3x x-1 d.) 5-3x x+1
-
Amit's small business loan requires payments of $1,009.85 at the end of every month for four years, including interest at 6.6% compounded monthly. What was the original amount of the loan? (Do not...
-
MgO prevents premature evaporation of Al in a furnace by maintaining the aluminum as Al2O3. Another type of matrix modifier prevents loss of signal from the atom X that readily forms the molecular...
-
Suppose that an algorithm uses only comparisons to find the i th smallest element in a set of n elements. Show that it can also find the i - 1 smaller elements and the n - i larger elements without...
-
This problem investigates D. Willard's "y-fast tries" which, like van Emde Boas trees, perform each of the operations MEMBER, MINIMUM, MAXIMUM, PREDECESSOR, and SUCCESSOR on elements drawn from a...
-
Consider the regularity condition af (n/b) cf (n) for some constant c < 1, which is part of case 3 of the master theorem. Give an example of constants a 1 and b > 1 and a function f (n) that...
-
Compare the three ways you can purchase mutual funds.
-
Describe a mutual fund prospectus.
-
You just inherited $4,000. You need to decide whether to invest this money in a mutual fund, pay down your credit card debt, or simply hold the money in an interest-bearing checking account to pay...
Study smarter with the SolutionInn App