In projection-based clustering, we note that, if the data is well separated into clusters with means...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In projection-based clustering, we note that, if the data is well separated into clusters with means μ₁,...,K, then the top K eigenvectors of the data covariance matrix, say (v₁,...,UK), tend to align with the Span(μ₁,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(Xį, X₁) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v₁,..., vk] € RdxK, the matrix of top principal directions ||v₁|| = 1. Consider two possible transformations of the data: (i) Xį ← VTX₁, and (ii) Xį ← VVTX₁ Would the clusters learnt from the data Ã; differ from those learnt from the data ¿? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime? In projection-based clustering, we note that, if the data is well separated into clusters with means μ₁,...,K, then the top K eigenvectors of the data covariance matrix, say (v₁,...,UK), tend to align with the Span(μ₁,...,K). It follows that PCA will approximately preserve the distance between cluster means. This intuition (and choice of projection) implicitly assumes that the Euclidean metric is the right way of measuring distance for our particular data. Where is that assumption coming in? What would you do otherwise, i.e., if it turns out that a different notion of distance dist(Xį, X₁) were more appropriate, would you prefer some other transformation over PCA? Following the notation from the previous problem, let V = [v₁,..., vk] € RdxK, the matrix of top principal directions ||v₁|| = 1. Consider two possible transformations of the data: (i) Xį ← VTX₁, and (ii) Xį ← VVTX₁ Would the clusters learnt from the data Ã; differ from those learnt from the data ¿? If we were to run Lloyd's on the transformed data (i) and separately on the transformed data (ii), would one be quicker to execute than the other, i.e., is there a difference in runtime?
Expert Answer:
Answer rating: 100% (QA)
Youre right that PCA makes the implicit assumption that the Euclidean distance is the appropriate me... View the full answer
Related Book For
Posted Date:
Students also viewed these accounting questions
-
If you were Bob Stevens, what would you do and why? As the students of Class 35 of the Marberry Executive MBA program straggled into the classroom for their one-day workshop on business ethics, they...
-
If you owned an ad agency, what would you do to attract new business? Be specific.
-
Given that, x=12, y=8 and z=4, what does the condition in the following IF statement evaluate to? IF (x / 3 = = y- z) AND (x + z + y ! = y + 15) THEN DISPLAY " Welcome to Programming Design" ENDIF...
-
The two key principles that form the foundation for an ethical sales presentation are OA) the approach and the close B) setting up the appointment and completing the application C) uncovering needs...
-
Could Lorenzos confrontation with Continentals unions have been more constructively handled? How?
-
Does birth order influence how much a child obeys his or her parents? For the following cross-tabulation, calculate chi-square and Cramers V. Obediance Birth Order First child Second child Third...
-
Journalizing materials requisitions Ozark Manufacturing Inc. records the following use of materials during the month of June: Date Req. No. Use Direct Materials Materials Requisitions Indirect...
-
The accounting records of Galvin Architects include the following selected, unadjusted balances at March 31: Accounts Receivable, $1,100; Office Supplies, $1,000; Prepaid Rent, $1,000; Equipment,...
-
(10 points) The ellipse can be drawn with parametric equations. Assume the curve is traced clockwise as the parameter increases. If x = 5 cos(t) then y=
-
2. The Global Tea and Organic Juice companies have merged. The following information has been collected for the "Consolidation Project." Chapter 7 Managing Risk 245 Activity Description Predecessor a...
-
The balance sheet of a company as of December 31, Year 8, included 13.25% bonds having a face amount of $90.4 million. The bonds had been issued in Year 1 and had a remaining discount of $3.4 million...
-
Contrast tests of controls and substantive tests as to (a) types, (b) purpose, (c) nature of test measurement, (d) applicable audit procedures, (e) timing, (f) audit risk component, (g) primary field...
-
After determining the acceptable level of detection risk for specified assertions for a new audit client and completing all other preliminary planning steps, the auditor develops an audit program for...
-
Final acceptable levels of detection risk have been determined for several assertions. The auditor is prepared to proceed with designing specific substantive tests. Required a. What is the purpose of...
-
a. Distinguish between sampling risk and nonsampling risk. b. Explain the types of sampling risk that may occur in auditing and their potential effects on the audit.
-
Mary Todd is uncertain about several relationships pertaining to audit sampling. As Mary's supervisor, explain the application of audit sampling to (1) GAAS and (2) the components of audit risk.
-
Outsourcing Decision Data Use the accomparping outsourcing decision data to corrpute the cost of in-house manutacturing and outsouroing for the following levels of demand \( 700,1,000,1,300,1,400 \)
-
B.) What is the approximate concentration of free Zn 2+ ion at equilibrium when 1.0010 -2 mol zinc nitrate is added to 1.00 L of a solution that is 1.080 M in OH - . For [Zn(OH) 4 ] 2- , K f = 4.610...
-
The user interface was to be written in either C or C++ to allow command-line arguments. Thus, this subsystem is very easy to design. It has two major features: interfacing with the operating system...
-
Suppose that we wished to add a graphical user interface to the major software engineering project. An electronic list of such tools can be found at the URL http://www...
-
Consider the requirements for our continuing software project as they were developed in Section 3.18 and Summary of this chapter. Apply the suggestions in this chapter to reorganize the requirements...
-
Proper Documentation. Properly designed and utilized forms facilitate adherence to prescribed internal control structure policies and procedures. One such form might be a multicopy purchase order,...
-
Separation of Duties. The division of the following duties is meant to provide the best possible controls for the Ma Foi Magasin, a small wholesale store in Dijon, France. 1 V Assemble supporting...
-
Information and Communications. Hayes & Hu, Ltd, personal financial advisers in the Notting Hill district of London, has asked Joseph Smallman, Chartered Accountant (CA), to recommend a computer...
Study smarter with the SolutionInn App