Question

1 Approved Answer

Posted on Sep 09, 2024

Please complete the following in R 5. 3.5 4.7 1.5 5.0 19 Sepal Length Sepal. Width Petal. Length Petal. Width Species 0.2 setosa * 2

Please complete the following in R

image text in transcribed

5. 3.5 4.7 1.5 5.0 19 Sepal Length Sepal. Width Petal. Length Petal. Width Species 0.2 setosa * 2 3.0 0.2 setosa 18 3 3.2 0.2 setosa 4.6 0.2 setosa # 5 3.6 1.4 0.2 setosa 3.9 0.4 setosa As a data scientist, you are interested in the minimal mumber of features among Sepal Length, Sepal.Width, Petal.Length and Petal. Width that are able to partition the observations into 3 clusters since there are only subspecies setosa, versicolor and virginica in the data set. For example, it may be true that Petal.Length and Petal. Width do just as well as all of Sepal.Length, Sepal.Width, Petal.Length and Petal. Width for this task since Sepal.Length, Sepal.Width, Petal.Length and Petal. Width may be highly correlated and using 4 highly correlated features does not sufficiently improve the classification performance than using just the highly uncorrelated Ones. You need to do the following: Q1. Study the pairwise correlations for the 4 features Sepal.Length, Sepal.Width, Petal.Length and Petal.Width, pick two features that are the least correlated, apply K-means with K = 3 to these features, and report how many obervations are correctly classified into a cluster, i.e., as being the same sub-species. Based on the selected features, what is the estimated number of clusters given by the gap statistic via the command clusGap? Q2. Apply K-means to all 4 features with K = 3, and report how many obervations are correctly classified into a cluster, i.e., as being the same sub-species. Compare the results with those in Q1. Based on the 4 features, what is the estimated number of clusters given by the gap statistie via the command clusGap? Q3. Apply hierarchical clustering with average linkage to the features, obtain the dendrogram, and cut the clustering tree with 3 clusters by the command cutree(hclustobj.k-3), where hclustobj is the object obtained by applying the hclust command. Compare the clustering results with those obtained in Q3. Note that before applying hierarchical clustering, it is recommended to scale the data. This can be done by the command scale(x, center - TRUE, scale = TRUE), where each column of x contains observations for a feature. Further, when you execute clusGap, please use X.sax-10, B-200 (and other default parameters). Q4. From the iris data set, pick all observations for the subspecies setosa and versicolor. For each of the 2 subspecies, use set.seed (123) to randomly select 40 observations to train a KNN classifier with 3 neighboring observations, apply the obtained KNN classifier to the rest 20 observations, and report the classification results. Note that when there are only two subspecies setosa and versicolor, an observation either is setosa or not. 5. 3.5 4.7 1.5 5.0 19 Sepal Length Sepal. Width Petal. Length Petal. Width Species 0.2 setosa * 2 3.0 0.2 setosa 18 3 3.2 0.2 setosa 4.6 0.2 setosa # 5 3.6 1.4 0.2 setosa 3.9 0.4 setosa As a data scientist, you are interested in the minimal mumber of features among Sepal Length, Sepal.Width, Petal.Length and Petal. Width that are able to partition the observations into 3 clusters since there are only subspecies setosa, versicolor and virginica in the data set. For example, it may be true that Petal.Length and Petal. Width do just as well as all of Sepal.Length, Sepal.Width, Petal.Length and Petal. Width for this task since Sepal.Length, Sepal.Width, Petal.Length and Petal. Width may be highly correlated and using 4 highly correlated features does not sufficiently improve the classification performance than using just the highly uncorrelated Ones. You need to do the following: Q1. Study the pairwise correlations for the 4 features Sepal.Length, Sepal.Width, Petal.Length and Petal.Width, pick two features that are the least correlated, apply K-means with K = 3 to these features, and report how many obervations are correctly classified into a cluster, i.e., as being the same sub-species. Based on the selected features, what is the estimated number of clusters given by the gap statistic via the command clusGap? Q2. Apply K-means to all 4 features with K = 3, and report how many obervations are correctly classified into a cluster, i.e., as being the same sub-species. Compare the results with those in Q1. Based on the 4 features, what is the estimated number of clusters given by the gap statistie via the command clusGap? Q3. Apply hierarchical clustering with average linkage to the features, obtain the dendrogram, and cut the clustering tree with 3 clusters by the command cutree(hclustobj.k-3), where hclustobj is the object obtained by applying the hclust command. Compare the clustering results with those obtained in Q3. Note that before applying hierarchical clustering, it is recommended to scale the data. This can be done by the command scale(x, center - TRUE, scale = TRUE), where each column of x contains observations for a feature. Further, when you execute clusGap, please use X.sax-10, B-200 (and other default parameters). Q4. From the iris data set, pick all observations for the subspecies setosa and versicolor. For each of the 2 subspecies, use set.seed (123) to randomly select 40 observations to train a KNN classifier with 3 neighboring observations, apply the obtained KNN classifier to the rest 20 observations, and report the classification results. Note that when there are only two subspecies setosa and versicolor, an observation either is setosa or not