Question

1 Approved Answer

Posted on May 19, 2024

QUESTION 1. Iris flower petal characteristics In this week's lab we revisit the model from last week and assess its fit. We again consider the

QUESTION 1. Iris flower petal characteristics In this week's lab we revisit the model from last week and assess its fit. We again consider the variables in the table below. Name Type Description response length of flower sepal predictor length of flower petal The data is from the "iris" data set built into R (see accompanying R code file). Recall the hypothesised population model = 0 + 1 + and fitted model = 4.3066 + 0.40892 . To assess compliance with the modelling assumptions we check the residuals = , {1,2,...,150}. as a proxy for checking the noise terms (we don't have these to check directly). We begin with a visual analysis of the residuals.

(a) Produce appropriate diagnostic plots and determine if the assumptions of normality, constant variance and independence appear to have been satisfied. We can also check normality of the residuals with a hypothesis test. (b) Using significance level = 0.05, test if the residuals are normally distributed. Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and a conclusion using a minimum of mathematical language. The independence assumption can also be investigated with the Durbin-Watson statistic, which assesses the degree of autocorrelation in a data sample. Note that significant autocorrelation would violate the independence assumption, but lack of autocorrelation does not necessarily imply independence. (c) Providing a reason for your answer, determine if there is any statistical evidence that the residuals are not independent . (d) The coefficient of determination for the model is 2 = 0.76. Other than via the fitted regression model, how else is this related to the variables in the model ? We are now going to identify data points with large influence on the estimated regression equation. We do this using Cook's D statistic. (e) Calculating a relevant statistic, identify any potentially influential points When we identify potentially influential points we exclude them and rebuild the model. If the estimated beta-coefficients in the reduced data set model have changed significantly from the full data set model we retain the reduced data set model, otherwise we return to the full data set model. Additionally, if excluding the points improves the behaviour of the residuals with regard to the assumptions, we retain the reduced data set model, otherwise we return to the full data set model. Create a new data excluding the 9 points identified above and re-run the regression on this reduced data set. (f) Calculate the proportional changes in the estimated beta coefficients between the reduced and full data set models

The proportional changes in the estimated beta coefficients are quite small, so the full data set model should be preferred on this basis. To finish our analysis, we will check the assumptions for the model fitted to the reduced data set. (g) Produce appropriate diagnostic plots and determine if the assumptions of normality, constant variance and independence appear to have been satisfied for the model on the reduced data set . (h) Using significance level = 0.05, test if the residuals of the reduced data set model are normally distributed. Write down the hypotheses, the test statistic and p-value, the test decision (with reason) and a conclusion using a minimum of mathematical language . (i) Providing a reason for your answer, determine if there any statistical evidence that the residuals of the reduced data set model are not independent .