Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 05, 2023

Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max algorithm in Information Theory, aims at assigning K

Part II Inference Algorithms and Methods (20 points) The K-means algorithm, also known as the Lloyd-Max algorithm in Information Theory, aims at assigning K centroids to a set of data. The centroids are points (or vectors) in Re, for a given dimension . The set of data, represented as a cloud of points, has N points in R. Figure 2 shows a dataset {n}_1 of N = 2000 points and K = 10 centroids {ck}_ The centroids are also called means because each centroid is the mean of its neighboring points. In Figure 2, the whole dataset is grouped into a unique cluster. In this case, the K centroids serve for vector quantization, i.e. approximate the shape of the entire cloud such that the mean squared error (MSE) is minimized. The MSE is (3) where c() is the centroid of the Voronoi cell (local neighborhood or local cluster) to which I belongs. After a random initialization of the K centroids, the K-means algo- rithm proceeds in two steps: Update the Voronoi Cells. c(xn) = Ck(n) n = 1... N. (4) See (20.3) in Chap. 20 of Sir David MacKay's book. Now, for each point n, we know that it belongs to the cell (n) with centroid c(xn) = C(n). (xn) is the closest centroid to In as given by (4). Let R(k) be the number of points in cell k, we have k_ R(k) = N. k=1 = MSE = Update the Centroids. argminen - Ck || closest centroid to In, - k=1 ||n C(xn)|| N : c(en)=ck R(K) Xn sum of all points in cell k number of points in cell k' Ck = (5) See (20.5) in Chap. 20 of Sir David MacKay's book. The algorithm iterates over the two steps until the MSE stops decreasing or until reaching a maximum number of iterations. In some cases, as shown in Figure 3, the dataset is already organized in clusters. The K-means algorithm is used to find the best positions of the K centroids. Then, given a new point, you can use its distance to the centroids to infer the cluster (i.e. the Voronoi cell) this new point belongs to. In this case, the centroids serve for classification. k = 1... K. (a) The MSE versus the iteration number is shown in Figure 4 for the dataset of N = 2000 blue points of Figure 2. How many iterations are needed for the K-means algorithm to reach convergence? Hint: Just follow the plot of Figure 4. (b) The initialization of the K centroids can be done in two different methods: 1) Select K random points in the space R, 2) Select K random points among the N points of the dataset. Which method is better? Explain. (c) Assume that the dataset is organized in three clusters, before running the K-means algorithm, as in Figure 3. Running the K-means with K = 3 centroids should lead to one centroid per cluster as shown in Figure 3. What happens if we run the K-means algorithm with K = 4 centroids or more?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

a The number of iterations needed for the Kmeans algorithm to reach convergence cannot be determined ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Algorithms

Authors: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest

3rd edition

978-0262033848

More Books

Students also viewed these Programming questions

Question

When working on Assignment A2-1 Job Analysis, you will act as an employer. First, use this template to specify the organization, job title, and key responsibilities. 2.Then, review a list of Sample...

Answered: 1 week ago

Question

★★★★★

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Answered: 1 week ago

Question

★★★★★

"All Boeing airplanes are certified and delivered to the highest levels of safety consistent with industry standards. Airplanes are delivered with baseline configuration, which includes a standard...

Answered: 1 week ago

Question

★★★★★

How do recruitment and selection practices contribute to high performance in an organization?

Answered: 1 week ago

Question

★★★★★

Consider a horizontal, rectangular opening that is 10 ft high and 20 ft wide and driven due north 5,000 ft. Rock properties are, Co = 23,700 psi, To = 1,480 psi, E = 5.29(106 psi), = 0.27, =...

Answered: 1 week ago

Question

★★★★★

5 List the steps in the approval process for a taxonomy to be considered to have final status. (Hint: look at www.xbrl.org.)

Answered: 1 week ago

Question

★★★★★

What is the form of the ???? statistic for testing hypotheses about the mean of a population?

Answered: 1 week ago

Question

★★★★★

Hart Manufacturing makes three products. Each product requires manufacturing operations in three departments: A, B, and C. The labor-hour requirements, by department, are as follows: During the next...

Answered: 1 week ago

Question

★★★★★

Declare a two - dimensional array for representing a tic - tac - toe board like this:

Answered: 1 week ago

Question

★★★★★

The Biometrix Corporation has been in operation for one full year (2019). Financial statements follow. Biometrixs management is interested in determining the value of the venture as of the end of...

Answered: 1 week ago

Question

★★★★★

Curtis bought an 8.5% annual coupon bond at par. One year later, he sold the bond at a quoted price of 98. During the year, market interest rates rose and inflation was 3%. What real rate of return...

Answered: 1 week ago

Question

★★★★★

6. Germs Co. revalued its building with historical cost of P40,000,000 and accumulated depreciation of P10,000,000 to a fair value of P48,000,000. Income tax rate is 30%. Requirements: a. Compute for...

Answered: 1 week ago

Question

★★★★★

In a report, 2-3 pages single-spaced, describe the purpose of the audit software and describe its capabilities.Provide at least two examples of how companies have used the software as it relates to...

Answered: 1 week ago

Question

★★★★★

15} A logging contractor has three feller bunchers. On any given day, each machine has a 90% chance of working. The contractor is considering buying a fourth feller buncher to offset the possibility...

Answered: 1 week ago

Question

★★★★★

Company XYZ, a mason contractor, received a government contract amounting to $400,00.00 from Mercy City's Department of General Services for the purpose of "pointing" the joints of the stone exterior...

Answered: 1 week ago

Question

★★★★★

QB1 (a) A motor gives gear A a clockwise angular velocity of @A rad/s and an anti- clockwise angular acceleration a rad/s, as shown in figure QB1. If the angular velocity of gear B is @B = 3 rad/s...

Answered: 1 week ago

Question

★★★★★

Katarina (Pty) Ltd (Katarina) is a company which manufactures furniture. The company uses standard costing system. The following is the budget for the month of January 2022 for 25000 units. Direct...

Answered: 1 week ago

Question

★★★★★

What is the maximum volume of 0.25 M sodium hypochlorite solution (NaOCl, laundry bleach) that can be prepared by dilution of 1.00 L of 0.80 M NaOCl?

Answered: 1 week ago

Question

★★★★★

Suppose that an algorithm uses only comparisons to find the i th smallest element in a set of n elements. Show that it can also find the i - 1 smaller elements and the n - i larger elements without...

Answered: 1 week ago

Question

★★★★★

This problem investigates D. Willard's "y-fast tries" which, like van Emde Boas trees, perform each of the operations MEMBER, MINIMUM, MAXIMUM, PREDECESSOR, and SUCCESSOR on elements drawn from a...

Answered: 1 week ago

Question

★★★★★

Consider the regularity condition af (n/b) cf (n) for some constant c 1 and a function f (n) that...

Answered: 1 week ago

Question

★★★★★

Construct a 95% confidence interval estimate for the population mean given the following values: s 55 n 250

Answered: 1 week ago

Question

★★★★★

Construct a 98% confidence interval estimate for the population mean given the following values: s 20 n 50

Answered: 1 week ago

Question

★★★★★

Determine the 90% confidence interval estimate for the population mean of a normal distribution given n 100, s 121, and x 1,200.

Answered: 1 week ago

Previous Question Next Question