The question is in the picture, Thank you.
In Chapter 2 we have seen how a 00 plot or a probability plot could be used to assess visually whether the quakes dataset could be modelled using an Exponential(0) or not. (For this homework, you can also assume the quakes dataset is drawn from an Exponential(9) distribution.) Although useful we would like to complete this graphical method with a statistical hypothesis test which would lead to a more objective and principled decision in order to answer the question whether the Exponential(9) distribution is appropriate or not. Numerous tests have been proposed in the literature (in particular in order to test normality) and we focus here on the Anderson-Darling test. We assume an observed random sample x1 , , xn from a probability density fX(y; 6) (and corresponding cumulative distribution function F X(y; 9)), i.e.x1, , xn are \"D samples of the random variable X. The Anderson-Darling (AD) test statistic is given by + (y) FX()';9))2 TAD(x1,...,xn) =nm mfx0dya where [301): #{ie {1,...n,n} 2x,- Sy}, is the empirical distribution function of the sample x1, . . . , x". This test can be used for any hypothesised F X(y; 6) (e.g. a normal, exponential etc. distribution). 1. (10 marks) State, in words and at most two sentences, the null and alternative hypotheses corresponding to the question above. 2. (10 marks) Briefly explain why the statistic suggested by Anderson and Darling may be useful to answer the question we are trying to solve? You may find it useful to refer to a probability plot and you should comment briefly on the roles played by the three terms (a) (y) Fx(y;9))2 (blFx(y;9)(1-Fx0';9)) (cum; 9). 3. (10 marks) For an observed statistic to)\" state the definition of the p-value for this test. You should state precisely the probability distribution of any random variable you may use and can assume 9* to be known for the moment. While the expression above leads to an intuitive interpretation of what the statistic can achieve, a more useful expression in practice is given by TAD ( X1 , ... , Xn ) = -n - 2i - 1 M [log (Fx (x(); 0) ) + log(1 - Fx (X(n+1-1); 0))], n where *(1), *(2), ..., *(n) is the order statistic of the sample, as defined in Chapter 1. You are not asked to establish this formula. Most often * is unknown and must be estimated from the observed sample and tobs can then be computed. It is unlikely that one can come up with a formula for the p-value of this test and in the remainder of the assignment you are asked to write some simple R code to implement a simulation method in order to approximate this quantity. 4. (20 marks) Write a function compute. ad. test (xs ) which takes a vector of observations xs and returns the corresponding value of the Anderson-Darling statistic for the maximum likelihood estimator of 0. You should test your function on the quakes dataset. [Hint: the ad. test function in the nortest R library (which is not installed by default), may be a source of inspiration for your code and may be used to check that your own code produces plausible values (you will not get marks for using it but some of you may find it useful/reassuring). You can see the code of the function by simply typing ad . test . Note that ad. test is designed to test normality and that it renormalizes the data, which you should not do here.] To complete the statistical procedure we require approximative the p-value. Again even when (* is known the distribution of T (X1, X2, ... , Xn) under the null hypothesis is not tractable and it is unlikely that it will be when 0* is estimated with the maximum likelihood estimator Omle . Note that this numerical method works in both scenarios. 5. (15 marks) Write pseudo-code describing an algorithm, based on simulation and similar to the procedure described in Section 4.3 of the lecture notes, to compute the p-value for an observed statistics tobs. 6. (15 marks) Write the R code corresponding to your pseudo-code to compute the p-value corresponding to quakes . Plot the histogram of the simulated statistics and draw a vertical line for the position of the observed test statistic. Conclusion? 7. (20 marks) Now adapt your code to write a R function which computes the type I error probability for this test for a given critical value c . The function should take the critical value c , a dataset xs and a number of simulations B as input and return the type I error probability. Use this function to find the critical value for a test at significance level a by trial and error for a = .15, .1, and .05