Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 11, 2024

For this question you will need the data set below, and you will need to install the package ape in R. The data come from

For this question you will need the data set below, and you will need to install the package ape in R. The data come from a four-component Gaussian mixture on the plane. We want to see how well we can recover the four clusters using hierarchical clustering methods as well as K-means. The basic commands to plot the dendogram using single or complete linkage are:

library(ape) d = dist(data from file) #or dist(scale(data from file)) if the data are first scaled clust = hclust(d,"single") #or hclust(d,"complete") plot(clust, main = "put title here", hang = -1, cex = .8,xlab = "", ylab = "", sub = "", axes = FALSE)

(a) Make a scatter plot of the data and identify the four clusters. (b) In one page, put the four plots of dendograms corresponding to: single-linkage and scaled data, complete-linkage and scaled data, single-linkage and unscaled data, and complete-linkage and unscaled data. Specify what plots they are using the title in each dendogram. What four clusters do you get with each dendogram? (c) Perturb the data matrix by adding zero-mean Gaussian noise to each of the columns. To column j add noise with variance 0.1*(sample variance of column j). In one page, put the four plots of dendograms corresponding to: single-linkage and scaled data without noise, complete-linkage and scaled data without noise, single-linkage and scaled noisy data, and complete-linkage and scaled noisy data. Specify what plots they are using the title in each dendogram. What four clusters do you get with each dendogram? To use K-means the basic command is: cl = kmeans(data, centers = number of clusters) and all the information you need is in the object cl. (a) Run K-means with four clusters three times and identify the clusters you get each time. You may get different answers, why? (b) Repeat (a) but using scaled data. (c) Make a plot of the ratio of between-sum-of-squares to total-sum-of-squares (that is, cl$betweenss/cl$totss ) as a function of number of clusters. Interpret the results.

dataset:

G X Y A1 0.1374 -0.9271 A2 0.3006 -0.5703 A3 0.0462 -0.5467 A4 0.8649 -0.2168 A5 -0.3043 -0.0842 A6 -0.3685 -0.1093 B1 1.2501 3.5413 B2 3.9105 3.3893 B3 3.8671 3.7512 B4 2.9201 4.7783 B5 3.8985 4.2231 B6 3.1837 1.7167 C1 10.1454 -1.1645 C2 10.0565 0.4510 C3 10.2200 -0.9178 C4 10.0508 0.0334 C5 11.3937 0.0177 C6 9.4167 1.1136 D1 -10.0000 5.0000 D2 -9.200 4.567

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Geometry

Geometry

Authors: David A Brannan, Matthew F Esplen, Jeremy J Gray

1st Edition

1107299292, 9781107299290

More Books

Students also viewed these Mathematics questions

Question

★★★★★

1. Based on the following data for the current year, what is the accounts receivable turnover? Net sales on account during year.....................................$ 525,500 Cost of merchandise sold...

Answered: 1 week ago

Question

★★★★★

Given that all are true, which of the following sentences, if added here, would most effectively introduce the new topic of this paragraph? F. NO CHANGE G. Water there shone H. Water shining J. Water...

Answered: 1 week ago

Question

★★★★★

What are your current research studies?

Answered: 1 week ago

Question

★★★★★

Project X is very risky and has an NPV of $3 million. Project Y is very safe and has an NPV of $2.5 million. They are mutually exclusive, and project risk has been properly considered in the NPV...

Answered: 1 week ago

Question

★★★★★

37. Unstructured problems Oa, do not require the decision maker to go through an involved decision process O b. are accompanied by ambiguous or incomplete information O refer to the usual problems...

Answered: 1 week ago

Question

★★★★★

Kando Company incurs a $10.00 per unit cost for Product A, which it currently manufactures and sells for $13.50 per unit. Instead of manufacturing and selling this product, the company can purchase...

Answered: 1 week ago

Question

★★★★★

The Capacity Factor average for solar panels is 24.9%. This is approximately 6 hours per day of power production on average across the year. For a 100 MW (mega-watt) plant how much power (MWh,...

Answered: 1 week ago

Question

★★★★★

Use the link given, to complete the lab below. The link is, https://phet.colorado.edu/en/simulations/wave-interference Lab: 2-Point Interference 25 Purpose: To test the equations for 2-Point...

Answered: 1 week ago

Question

★★★★★

Zuma is deciding between a 3 0 year and 1 5 year mortgage. He plans to purchase a $ 4 0 0 , 0 0 0 and has decided to do a 2 0 % downpayment so he doesn't have PMI. He expects property taxes to be $ 3...

Answered: 1 week ago

Question

★★★★★

An ANOVA is run looking at the effect of water, sun, and water*sun on plant growth. Given the partial output below, what conclusion would you draw about the relationship of these variable(s) on plant...

Answered: 1 week ago

Question

★★★★★

Sta. Elena Winery sells 100,000 gallons of wine a year. It sells 10% of its wine in barrels, 60% in jugs, and 30% in 1-liter bottles. Its base selling price is $2 per gallon in barrels. The selling...

Answered: 1 week ago

Question

★★★★★

The water from a re hose follows a path described by y: 5.0 + 0.51 0.10)(2 {units are in meters}. lfyx is constant at 10.0 mls, nd the resultant velocity at the point {5.0.5.0}. Find the magnitude of...

Answered: 1 week ago

Question

★★★★★

Computing the rate of return on initial investment. The Century Metals Company uses the straight-line method in computing depreciation. What is the anticipated rate of return on the initial...

Answered: 1 week ago

Question

★★★★★

In deciding whether a new labor-saving machine should be purchased, what type of cost data is relevant?

Answered: 1 week ago

Question

★★★★★

Is the trade-in value or the resale value of existing equipment a relevant cost in a decision about whether or not to replace the existing equipment with new equipment? Why?

Answered: 1 week ago

Previous Question Next Question