Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this question you will need the data set below, and you will need to install the package ape in R. The data come from

For this question you will need the data set below, and you will need to install the package ape in R. The data come from a four-component Gaussian mixture on the plane. We want to see how well we can recover the four clusters using hierarchical clustering methods as well as K-means. The basic commands to plot the dendogram using single or complete linkage are:

library(ape) d = dist(data from file) #or dist(scale(data from file)) if the data are first scaled clust = hclust(d,"single") #or hclust(d,"complete") plot(clust, main = "put title here", hang = -1, cex = .8,xlab = "", ylab = "", sub = "", axes = FALSE)

(a) Make a scatter plot of the data and identify the four clusters. (b) In one page, put the four plots of dendograms corresponding to: single-linkage and scaled data, complete-linkage and scaled data, single-linkage and unscaled data, and complete-linkage and unscaled data. Specify what plots they are using the title in each dendogram. What four clusters do you get with each dendogram? (c) Perturb the data matrix by adding zero-mean Gaussian noise to each of the columns. To column j add noise with variance 0.1*(sample variance of column j). In one page, put the four plots of dendograms corresponding to: single-linkage and scaled data without noise, complete-linkage and scaled data without noise, single-linkage and scaled noisy data, and complete-linkage and scaled noisy data. Specify what plots they are using the title in each dendogram. What four clusters do you get with each dendogram? To use K-means the basic command is: cl = kmeans(data, centers = number of clusters) and all the information you need is in the object cl. (a) Run K-means with four clusters three times and identify the clusters you get each time. You may get different answers, why? (b) Repeat (a) but using scaled data. (c) Make a plot of the ratio of between-sum-of-squares to total-sum-of-squares (that is, cl$betweenss/cl$totss ) as a function of number of clusters. Interpret the results.

dataset:

G X Y A1 0.1374 -0.9271 A2 0.3006 -0.5703 A3 0.0462 -0.5467 A4 0.8649 -0.2168 A5 -0.3043 -0.0842 A6 -0.3685 -0.1093 B1 1.2501 3.5413 B2 3.9105 3.3893 B3 3.8671 3.7512 B4 2.9201 4.7783 B5 3.8985 4.2231 B6 3.1837 1.7167 C1 10.1454 -1.1645 C2 10.0565 0.4510 C3 10.2200 -0.9178 C4 10.0508 0.0334 C5 11.3937 0.0177 C6 9.4167 1.1136 D1 -10.0000 5.0000 D2 -9.200 4.567

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Geometry

Authors: David A Brannan, Matthew F Esplen, Jeremy J Gray

1st Edition

1107299292, 9781107299290

More Books

Students also viewed these Mathematics questions

Question

What are your current research studies?

Answered: 1 week ago