Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Please help asap 2. Clustering and feature selection (a) Figure 2 shows the scatterplot of an unlabelled data set Z with 8 objects. The objects
Please help asap
2. Clustering and feature selection (a) Figure 2 shows the scatterplot of an unlabelled data set Z with 8 objects. The objects are numbered as shown in the figure. [18] 10 A 2 9 8 7 7 4 6 5 8.5 6 4 3 2 1 3 o 0 1 2 3 4 3 6 7 5 8 9 10 Figure 2: Scatterplot of the data set for problem 2 (a). Apply the single linkage clustering algorithm to this data. Give the steps in the following format: Step # 1 2 Clusters 1, 2, 3, 4, 5, 6, 7, 8 Number of clusters Distance (1) Jump 8 0.0000 0.0000 [17] Subsequently, recommend the number of clusters for this data set, and show which points are included in each cluster. (b) Apply the k-means algorithm to cluster this data set into 2 clusters. Start with initial means at points 1 and 8. Give your solution in the format shown below. (Note 1: You need to calculate the cluster means. But you don't have to calculate the exact distances between points and means if you can judge by eye which mean is the closest. Calcu- late distances only if you are in doubt.) Subsequently, mark the clusters on the graph. Include the figure in your report. (Note 2: The algorithm converges quite quickly.) Iteration X Old means: (x,x) (x,x) Clusters: (y,...), (...) New means: (x,x) (x,x) [15] (c) Consider a data set with 6 features. Function f(S) is used to assess the quality of subset S. Table 1 shows the values of f(S) for this part. For example, the set containing features 2, 4 and 6 (246 in the blue column) is worth 75 units. Assume that larger f is better. Apply Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) to find a subset of 3 features. Explain your solution. Give a comment on the results from the two methods. Is f f S 123456 1 32 94 3 TWIN 56 4 5 75 47 9 6 Table 1: Feature selection table for problem 2 (d) St. S t S f S f 12 43 123 79 1234 37 12345 8 13 57 124 25 1235 76 12346 78 14 70 125 13 1236 69 12356 52 15 2 126 50 1245 68 12456 63 16 16 134 8 1246 3 13456 95 23 75 135 82 1256 59 23456 25 24 100 136 89 1345 12 25 10 145 8 1346 61 26 18 146 24 1356 28 34 39 156 20 1456 12 35 38 234 16 2345 73 36 97 235 100 2346 78 45 2 236 85 2356 11 46 5 245 77 2456 45 56 69 246 75 3456 52 256 69 345 46 346 76 356 94 456 90 5 2. Clustering and feature selection (a) Figure 2 shows the scatterplot of an unlabelled data set Z with 8 objects. The objects are numbered as shown in the figure. [18] 10 A 2 9 8 7 7 4 6 5 8.5 6 4 3 2 1 3 o 0 1 2 3 4 3 6 7 5 8 9 10 Figure 2: Scatterplot of the data set for problem 2 (a). Apply the single linkage clustering algorithm to this data. Give the steps in the following format: Step # 1 2 Clusters 1, 2, 3, 4, 5, 6, 7, 8 Number of clusters Distance (1) Jump 8 0.0000 0.0000 [17] Subsequently, recommend the number of clusters for this data set, and show which points are included in each cluster. (b) Apply the k-means algorithm to cluster this data set into 2 clusters. Start with initial means at points 1 and 8. Give your solution in the format shown below. (Note 1: You need to calculate the cluster means. But you don't have to calculate the exact distances between points and means if you can judge by eye which mean is the closest. Calcu- late distances only if you are in doubt.) Subsequently, mark the clusters on the graph. Include the figure in your report. (Note 2: The algorithm converges quite quickly.) Iteration X Old means: (x,x) (x,x) Clusters: (y,...), (...) New means: (x,x) (x,x) [15] (c) Consider a data set with 6 features. Function f(S) is used to assess the quality of subset S. Table 1 shows the values of f(S) for this part. For example, the set containing features 2, 4 and 6 (246 in the blue column) is worth 75 units. Assume that larger f is better. Apply Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) to find a subset of 3 features. Explain your solution. Give a comment on the results from the two methods. Is f f S 123456 1 32 94 3 TWIN 56 4 5 75 47 9 6 Table 1: Feature selection table for problem 2 (d) St. S t S f S f 12 43 123 79 1234 37 12345 8 13 57 124 25 1235 76 12346 78 14 70 125 13 1236 69 12356 52 15 2 126 50 1245 68 12456 63 16 16 134 8 1246 3 13456 95 23 75 135 82 1256 59 23456 25 24 100 136 89 1345 12 25 10 145 8 1346 61 26 18 146 24 1356 28 34 39 156 20 1456 12 35 38 234 16 2345 73 36 97 235 100 2346 78 45 2 236 85 2356 11 46 5 245 77 2456 45 56 69 246 75 3456 52 256 69 345 46 346 76 356 94 456 90 5Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started