Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please do what ever you can. Don't worry about answering the whole question. The data analysed in this question constitutes 6 measurements of 90 motor

Please do what ever you can. Don't worry about answering the whole question.

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
The data analysed in this question constitutes 6 measurements of 90 motor cyclists, and 3 motor cyclist groups: group 1 = junior motor cyclists group 2 = non-professional motor cyclists group 3 = professional motor cyclists The following variables were measured on each cyclist's head/face dimensions Y1= HWWIDE = head width at widest dimension, Y2= HDCIRCUM = head circumference. Y;= EYELEVELFB = front-to-back measurement at eye level, Y4=EYETOPHD = eye-to-top-of-head measurement, Y;= EARTOPHD = ear-to-top-of-head measurement, Y6= WIDTHJAW = jaw width. The SAS outputs and results are given in Appendix 2, Appendix 3 and Appendix 4. Q5. part 1: Refer to Appendix 2. (1+1+2+4+1+1+3+1+0.5+0.5) = 15 marks The results of a traditional PROC DISCRIM on the motor cyclist data is reported in Appendix 2. You can round up all SAS numbers in the Appendices to 2 decimal places, for ease of writing. Using Appendix 2 answer the following questions. a) What are the number of classes, number of observations and number of variables in the analysis? (1 mark) b) How is the pooled covariance matrix used in the DISCRIM analysis? (1 mark) c) What do the generalized squared distances inform us about? (2 marks) d) Write out the linear discriminant functions. (4 marks) e) Which group is the furthest from the non-professional motor cyclists. Justify your answer based on Appendix 1. (1 mark) f) Which group is the closest to the professional motor cyclists? Justify your answer based on Appendix 1. (1 mark) g) How is the Mahalanobis distance used in discriminant analysis (relate this to centroids)? Give its mathematical formula. (3 marks) h) How many motor cyclists are incorrectly classified according to DISCRIM analysis? (1 mark) i) What is the apparent error rate (AER) value? (0.5 marks) j) Which group has the highest proportion of mis-classified cyclists? What is that proportion? (0.5 marks)Q5. part 2: Refer to Appendix 3. (1+1+1+2) =5 marks The results of a PROC STEPDISC are reported on the same motor cyclist data in Appendix 3, use this and your own knowledge to answer the following questions. ) Is a forwards or backwards Discriminant analysis procedure being run in Appendix 3? Justify your answer. (1 mark) b) What set of potential discriminator variables is chosen? Interpret the final model using the information in Appendix 3. (1 mark) c) What are the roles of the R squared, F statistic and the Tolerance in the stepwise procedure? (2 marks) d) What do we mean that a variable is redundant in a stepwise discriminant analysis? (1 mark) Q5. part 3: Refer to Appendix 4. (2+2+2+2+2+2+2+0.5+0.5) = 15 marks Another variant of DISCRIMINANT ANALYSIS in SAS is Canonical discriminant analysis (PROC CANDISC) which is a dimension-reduction technique related to principal component analysis (PCA). The methodology that is used in deriving the canonical coefficients parallels that of a one-way multivariate analysis of variance (MANOVA). The CANDISC procedure derives what are called canonical variables (Can1, Can2,... ), which are linear combinations of the inputted variables. These canonical variables summarise the between-class variation in much the same way that principal components (PCs) summarise total variation. The CANDISC procedure performs the following: 1. a canonical discriminant analysis, 2. computes squared Mahalanobis distances between class means, and 3. performs both univariate and multivariate one-way analyses of variance. Two output data sets are produced by PROC CANDISC: one that contains the canonical coefficients the other that contains the scored canonical variables. Performing canonical discriminant analysis is in fact equivalent to performing the following steps: 1. Transform the variables so that the pooled within-class covariance matrix is an identity matrix. 2. Compute class means on the transformed variables. 3. Perform a principal component analysis on the means, weighting each mean by the number of observations in the class. 4. The eigenvalues are equal to the ratio of between-class variation to within-class variation in the direction of each principal component. 5. Back-transform the principal components into the space of the original variables to obtain the canonical variables. The results of a Canonical discriminant analysis (PROC CANDISC) are reported In Appendix 4. You can round up SAS numbers in Appendix 4 to 2 decimal places, for ease of writing. Note also that the plot at the end of Q5. part 3 below displays a plot of the first two canonical variables.Using the plot below, the description above and the results of the analysis in Appendix 4 answer the following questions. a) How is the CANDISC procedure similar to PCA? (2 marks) b) What is the MANOVA model and what is its null hypothesis? (2 marks) c) Write out the formulation for Canl and Can2. (2 marks) d) Interpret in your own words the constructs CAN1 and CAN2. (2 marks) e) Are all the motor cyclist groups mean vectors equal? Justify your answer according to the appropriate test statistic and associated p value. (2 marks) F) Report the values of the centroids for each of the motor cyclist groups in the 2-dimensional canonical space - refer to the plot of the first two canonical variables below. (2 marks) g) Locate the motor cyclist group centroids found in part f) and put on a rough sketch of the plot of the first two canonical discriminant variables (CAN 1 and CAN2) - see the plot below. (2 marks) h) Which motor cyclist groups does the first Canonical Variable (CAN1) best discriminate between? Justify your answer. (0.5 marks) i) Which motor cyclists groups does the second Canonical Variable (CAN2) best discriminate between? Justify your answer. (0.5 marks) Plot of the first two canonical discriminant functions CAN2 0 -2 -2 0 N CAN1 Group " Junior Motor Cyclists - Professional Cyclists . Non-Prof CyclistsAPPENDIX 2: Motor Cyclists: Discriminant Analysis The DISCRIM Procedure Total Sample Size 90 DF Total Bg Variables 6 DF Within Classes 87 Classes 3 DF Between Classes 2 Number of Observations Read 90 Number of Observations Used 90 Class Level Information Variable Prior groupname Name Frequency Weight Proportion Probability Junior Motor Cyclists Junior Motor Cyclists 30 30.0000 0.333333 0.333333 Non-Prof Cyclists Non-Prof Cyclists 30 30.0000 0.333333 0.333333 Professional Cyclists Professional Cyclists 30 30.0000 0.333333 0.333333 Pooled Covariance Matrix Information Natural Log of the Covariance Determinant of the Matrix Rank Covariance Matrix -3.69402 The DISCRIM Procedure Generalized Squared Distance to groupname Junior Motor Non-Prof Professional From groupname Cyclists Cyclists Cyclists Junior Motor Cyclists 7.30412 9.55277 Non-Prof Cyclists 7.30412 0 0.83625 Professional Cyclists 9.55277 0.83625 015 The DISCRIM Procedure Multivariate Statistics and F Approximations S=2 M=1.5 N=40 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.30712303 10.99 12 164 <.0001 pillai trace hotelling-lawley roy greatest root note: f statistic for is an upper bound. wilks lambda exact. linear discriminant function groupname junior motor non-prof professional variable cyclists constant hwwide hdcircum eyelevelfb eyetophd eartophd widthjaw of observations and percent classified into from total priors error count estimates rate appendix cyclists: stepwise analysis the stepdisc procedure selection summary average squared number partial pr canonical> Step In Entered Removed R-Square F Value Pr > F Lambda Lambda Correlation ASCC 1 EYETOPHD 0.5721 58.16 <.0001 hwwide widthjaw eartophd y1="HWWIDE" head width at widest dimension y hdcircum="head" circumference y:="EYELEVELFB" front-to-back measurement eye level y4="EYETOPHD" eye-to-top-of-head eartophid="ear-to-top-of-head" jaw width17 appendix motor cyclists: canonical discriminant analysis multivariate statistics and f approximations s="2" m="1.5" n="40" statistic value num df den pr> F Wilks' Lambda 0.30712303 10.99 12 164 *.0001 Pillai's Trace 0.76115937 8.50 12 166 2.0001 Hotelling-Lawley Trace 2.03369496 13.78 12 124.51

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Algebra

Authors: Marvin L Bittinger

12th Edition

0321922913, 9780321922915

More Books

Students also viewed these Mathematics questions

Question

How easy the information is to remember

Answered: 1 week ago

Question

The personal characteristics of the sender

Answered: 1 week ago