Question
1. (25 points) K-means step by step, k=2 Extract kmeandata from cluster_data.mat as your dataset. a. Initialize the cluster centers (the means) at (-2,-2) and
1. (25 points) K-means step by step, k=2 Extract kmeandata from cluster_data.mat as your dataset.
a. Initialize the cluster centers (the means) at (-2,-2) and (4,4)
b. Calculate the distance between each data point and each center (you should have an array containing a row for every data point and a distance value for each cluster mean).
c. Assign each data point with the label of its nearest cluster (cluster_ind)
d. Plot the clustered data points using different colors for each cluster
e. Update the means
f. Continue for 4 iterations, creating one plot for each iteration inside a 2x2 subplot.
g. Repeat for a new initialization of cluster centers: (0, -1) and (-1, 4)
2. K-means step by step, k=3 Follow the same instructions as above, using two different cluster initializations: Centers = (2,-2), (-2,-2), (2,2) and Centers = (0,-3), (0,0), (0,3)
3. K-means step by step, k=4 Follow the same instructions as above (only one cluster initialization) Centers = (0,-3), (0,-1), (0,1), (0,2)
4. Which value of k will produce the best result? How can you tell?
5. Using the same k-means algorithm as above, cluster the data, but only plot the final results. The convergence criterion is when the total distance change between centers at two different iterations is less than 0.001 (or you can simply run the algorithm for a large number of iterations, like 200). Make a figure with 2x2 subplots. In each subplot, plot the final result of kmeans clustering with different k values. Subplot 1: k=4; Subplot 2: k=8; Subplot 3: k=12; Subplot 4: k=20
6. Using the same k-means algorithm, run k-means for k = 1 through 25. You will need to use a for loop. For each value of k, store the total distance of all data points from their cluster centers.
a. Plot the total distance for each value of k. Which k do you think explains the data best?
b. You may notice that your distance graph does not follow a smoothline trend. If it doesn't, can you explain why there are "jumps" in the distance values?
c. Run your code for generating the total distances, but this time use a loop to repeat the process a few times (~5 or more). Store the minimum total distance for each value of k and use this to make a new version of your plot from part (a).
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started