Question
1. In order to derive required drug dosages Milner and Rougier (2014) recorded a number of variables on a cohort of 544 Kenyan donkeys. The
1. In order to derive required drug dosages Milner and Rougier (2014) recorded a number of variables on a cohort of 544 Kenyan donkeys. The data are available on Brightspace as Donkeys.csv.
Variable Measurement scale
Girth cm
Height cm
Length cm
Weight kg
Age < 2, 2-5, 5-10, 10-15, 15-20, >20 (years)
BCS Body condition score: 1 (emaciated) to 3 (healthy) to 5 (obese) in steps of 0.5.
Sex Female, gelding, stallion.
(a) Use suitable plots to illustrate each of the variables in the donkey data set. Comment on the distributions of the variables, and on the outlying donkey. Remove the outlying donkey from the dataset.
(b) A principal components analysis is employed to reduce the dimension of the donkey data. Which variables should be used in such an analysis?
(c) Would you advise performing principal components analysis on the correlation matrix or covariance matrix of the appropriate donkey variables? Explain you reasoning
(d) Write your own function that could be used to apply principal components analysis to a multivariate data set. You should not use any inbuilt PCA functions that are available in R, but should derive the method from first principles and write your own code to implement the method accordingly. Your function should output objects that would be of interest to someone using your function.
(e) Set the seed in R to your student number. Randomly sample 5 values between 1 and 500, and remove the corresponding donkeys from the data set. All subsequent analyses in question 1 should be conducted on this version of the dataset. From the output of the application of your own PCA function to the appropriate variables, how many principal components are required to summarise the donkey data? Use suitable plot(s) to motivate your decision.
(f) Interpret the first column of the loadings matrix resulting from the application of your PCA function to the appropriate variables from the modified donkeys data (from 1(e)).
(g) Plot the first principal component scores of the donkeys resulting from the application of your PCA function to the appropriate variables from the modified donkeys data (from 1(e)). Why is such a plot useful in PCA? Comment on the principal component scores in the context of the available data.
(h) The jackknife is one method that could be used to validate the principal components solution. Detail in your own words how the method works. Write your own code to implement the method. Use your code to validate the results obtained from applying your PCA function to the appropriate variables from the modified donkeys data (from 1(e))
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started