Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please help asap 2. Clustering and feature selection (a) Figure 2 shows the scatterplot of an unlabelled data set Z with 8 objects. The objects

image text in transcribedimage text in transcribed

Please help asap

2. Clustering and feature selection (a) Figure 2 shows the scatterplot of an unlabelled data set Z with 8 objects. The objects are numbered as shown in the figure. [18] 10 A 2 9 8 7 7 4 6 5 8.5 6 4 3 2 1 3 o 0 1 2 3 4 3 6 7 5 8 9 10 Figure 2: Scatterplot of the data set for problem 2 (a). Apply the single linkage clustering algorithm to this data. Give the steps in the following format: Step # 1 2 Clusters 1, 2, 3, 4, 5, 6, 7, 8 Number of clusters Distance (1) Jump 8 0.0000 0.0000 [17] Subsequently, recommend the number of clusters for this data set, and show which points are included in each cluster. (b) Apply the k-means algorithm to cluster this data set into 2 clusters. Start with initial means at points 1 and 8. Give your solution in the format shown below. (Note 1: You need to calculate the cluster means. But you don't have to calculate the exact distances between points and means if you can judge by eye which mean is the closest. Calcu- late distances only if you are in doubt.) Subsequently, mark the clusters on the graph. Include the figure in your report. (Note 2: The algorithm converges quite quickly.) Iteration X Old means: (x,x) (x,x) Clusters: (y,...), (...) New means: (x,x) (x,x) [15] (c) Consider a data set with 6 features. Function f(S) is used to assess the quality of subset S. Table 1 shows the values of f(S) for this part. For example, the set containing features 2, 4 and 6 (246 in the blue column) is worth 75 units. Assume that larger f is better. Apply Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) to find a subset of 3 features. Explain your solution. Give a comment on the results from the two methods. Is f f S 123456 1 32 94 3 TWIN 56 4 5 75 47 9 6 Table 1: Feature selection table for problem 2 (d) St. S t S f S f 12 43 123 79 1234 37 12345 8 13 57 124 25 1235 76 12346 78 14 70 125 13 1236 69 12356 52 15 2 126 50 1245 68 12456 63 16 16 134 8 1246 3 13456 95 23 75 135 82 1256 59 23456 25 24 100 136 89 1345 12 25 10 145 8 1346 61 26 18 146 24 1356 28 34 39 156 20 1456 12 35 38 234 16 2345 73 36 97 235 100 2346 78 45 2 236 85 2356 11 46 5 245 77 2456 45 56 69 246 75 3456 52 256 69 345 46 346 76 356 94 456 90 5 2. Clustering and feature selection (a) Figure 2 shows the scatterplot of an unlabelled data set Z with 8 objects. The objects are numbered as shown in the figure. [18] 10 A 2 9 8 7 7 4 6 5 8.5 6 4 3 2 1 3 o 0 1 2 3 4 3 6 7 5 8 9 10 Figure 2: Scatterplot of the data set for problem 2 (a). Apply the single linkage clustering algorithm to this data. Give the steps in the following format: Step # 1 2 Clusters 1, 2, 3, 4, 5, 6, 7, 8 Number of clusters Distance (1) Jump 8 0.0000 0.0000 [17] Subsequently, recommend the number of clusters for this data set, and show which points are included in each cluster. (b) Apply the k-means algorithm to cluster this data set into 2 clusters. Start with initial means at points 1 and 8. Give your solution in the format shown below. (Note 1: You need to calculate the cluster means. But you don't have to calculate the exact distances between points and means if you can judge by eye which mean is the closest. Calcu- late distances only if you are in doubt.) Subsequently, mark the clusters on the graph. Include the figure in your report. (Note 2: The algorithm converges quite quickly.) Iteration X Old means: (x,x) (x,x) Clusters: (y,...), (...) New means: (x,x) (x,x) [15] (c) Consider a data set with 6 features. Function f(S) is used to assess the quality of subset S. Table 1 shows the values of f(S) for this part. For example, the set containing features 2, 4 and 6 (246 in the blue column) is worth 75 units. Assume that larger f is better. Apply Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) to find a subset of 3 features. Explain your solution. Give a comment on the results from the two methods. Is f f S 123456 1 32 94 3 TWIN 56 4 5 75 47 9 6 Table 1: Feature selection table for problem 2 (d) St. S t S f S f 12 43 123 79 1234 37 12345 8 13 57 124 25 1235 76 12346 78 14 70 125 13 1236 69 12356 52 15 2 126 50 1245 68 12456 63 16 16 134 8 1246 3 13456 95 23 75 135 82 1256 59 23456 25 24 100 136 89 1345 12 25 10 145 8 1346 61 26 18 146 24 1356 28 34 39 156 20 1456 12 35 38 234 16 2345 73 36 97 235 100 2346 78 45 2 236 85 2356 11 46 5 245 77 2456 45 56 69 246 75 3456 52 256 69 345 46 346 76 356 94 456 90 5

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Driven Web Sites

Authors: Joline Morrison, Mike Morrison

2nd Edition

? 061906448X, 978-0619064488

More Books

Students also viewed these Databases questions

Question

What is Change Control and how does it operate?

Answered: 1 week ago

Question

How do Data Requirements relate to Functional Requirements?

Answered: 1 week ago