Question

1 Approved Answer

Posted on Oct 13, 2024

University of California, Los Angeles Department of Statistics Statistics 100C Instructor: Nicolas Christou Homework 3 EXERCISE 1 Answer the following questions: a. Suppose y1 ,

University of California, Los Angeles Department of Statistics Statistics 100C Instructor: Nicolas Christou Homework 3 EXERCISE 1 Answer the following questions: a. Suppose y1 , y2 , . . . , yn are independent random variables and yi = + i for i = 1, 2, . . . , n. Assume that E( i ) = 0, var( i ) = 2 , and cov( i , j ) = 0. Find the least squares estimate of . Give the variance of this estimate. b. Consider the model yi = 0 + 1 xi + i . Assume that E( i ) = 0, var( i ) = 2 , and cov( i , n x = 0. What are the least squares estimates of 0 and 1 ? given that i=1 i c. We have shown that yi can be expressed as yi = hii yi + n j=i j) = 0. In addition, it is hij yj . Use this expression to nd var(i ). y d. Find an expression of corr(ei , ej ) in terms of hii , hjj , hij . EXERCISE 2 Answer the following questions: a. Consider the model yi = 0 + 1 xi + i . Assume that E( i ) = 0, var( i ) = 2 , and cov( i , j ) = 0. Suppose we rescale the x values as x = x , and we want to t the model yi = 0 + 1 x + i . Find the least squares estimates of 0 i and 1 . b. Refer to the model yi = 0 + 1 x + i of part (a). Find the SSE of this model and compare it to the SSE of the model i yi = 0 + 1 xi + i . What is your conclusion? c. Consider the simple regression model yi = 0 + 1 xi + i , with E( i ) = 0, var( i ) = 2 , and cov( i , n n 2 ESY Y = (n 1) 2 + 1 SXX , where SY Y = (y y )2 and SXX = (x x)2 . i=1 i i=1 i d. Refer to the model of part (c). Find cov( i , ei ). EXERCISE 3 Consider the simple regression model yi = 0 + 1 xi + i , with E( i ) = 0, var( i ) = 2 , and cov( i , that i N (0, ). Suppose we want to test simultaneously H0 : 1 = 1 and 0 = 0 Ha : The hypothesis H0 is not true. Answer the following questions: j) j) = 0. Show that = 0. Also, assume a. In the expression 1 Q2 = 2 2 n (yi 0 1 xi )2 if we add and subtract 0 and add and subtract 1 xi show that i=1 Q2 1 = 2 2 n n (yi 0 1 xi )2 + n(0 0 )2 + (1 1 )2 i=1 x2 + 2n(0 0 )(1 1 ) x i . (1) i=1 b. Let D = 0 + 1 x. Show that the random variables 1 and D are uncorrelated, and explain why 1 and D must therefore be independent. c. Show that the sum of the last three terms of (1) in part (a) is equal to (D 0 1 x)2 (1 1 )2 + . 1 ) var(D) var( d. If H0 is true, what are the degrees of freedom of the random variables (1 1 )2 var(1 ) and (D0 1 x)2 ? var(D) EXERCISE 4 Consider the simple regression model yi = 0 + 1 xi + i , with E( i ) = 0, var( i ) = 2 , and cov( i , that i N (0, ). Answer the following questions: j) = 0. Also, assume a. Find EYi2 . b. Find the distribution of Y . c. Find E Y 2 . d. Find cov( n , 1 ). i=1 i e. Suppose EYi = 0 + 1 xi , but that the Yi s are not necessarily independent or normally distributed and do not necessarily have equal variances. Are 0 and 1 unbiased estimators of 0 and 1 ? EXERCISE 5 Consider the simple regression model yi = 0 + 1 xi + i , with E( i ) = 0, var( i ) = 2 , and cov( i , j ) = 0. Also, assume that i N (0, ). In this numerical example, y represents the concentration of lead in ppm and x represents the concentration of zinc in ppm of soil at a particular area of interest. The sample size was n = 15. These data gave the following results: n (y y )(i y ) = 70767.08. y i=1 i n (y i=1 i n (x i=1 i n x2 i=1 i y )2 = 73327.6. x)2 = 1560112. = 5072016. y = 161.4. Answer the following questions: a. Find 1 . b. Find 0 . c. Compute s2 . e d. Compute the value of the F statistic in testing the hypothesis H0 : 1 = 0 Ha : 1 = 0. e. Compute var(0 ). EXERCISE 6 Consider the simple regression model yi = 0 + 1 xi + i N (0, ). Answer the following questions: i, i = 1, . . . , n with E( i ) = 0, var( i ) = 2 , cov( i , j) = 0, and a. Find Cov(i , yi ). y b. Find Cov( n , i=1 i n e ). i=1 i EXERCISE 7 Three variables N, D, and Y , all have zero sample means and unit sample variances. A fourth variable is C = N + D. In the regression of C on Y , the slope is 0.8. In the regression of C on N , the slope is 0.5. In the regression of D on Y the slope is 0.4. What is the error sum of squares in the regression of C on D? There are 21 observations. EXERCISE 8 Answer the following questions: a. Consider the simple regression model yi = 0 + 1 xi + i , i = 1, . . . , n with E( i ) = 0, var( i ) = 2 , cov( i , n x . and i N (0, ). Show that the correlation coecient between 0 and 1 is n n i=1 j) = 0, x2 i b. Refer to the model of part (a). Given that x = 0, derive an F statistic for testing the hypothesis H0 : 1 = 0 against the alternative Ha : 1 = 0 . EXERCISE 9 Access the following data in R: a <- read.table("http://www.stat.ucla.edu/~nchristo/statistics100c/soil_complete.txt", header=TRUE) Answer the following questions: a. Run the regression of cadmium on zinc. Attach the R output. b. Compute the leverage values. c. Suppose the 10th observation is deleted. Give the formula that computes the new 1 and 0 . Use R to compute them and attach the code. EXERCISE 10 Breast cancer mortality data: The data contain breast cancer mortality (y) from 1950 to 1960 and the adult white female populations (x) in 1960 for 301 counties in North Carolina, South Carolina, and Georgia. Access the data: a <- read.table("http://www.stat.ucla.edu/~nchristo/statistics100c/cancer.txt", sep=",", header=TRUE) Answer the following questions: 1. Construct a scatterplot of y on x. 2. Run the regression through the origin of y on x. 3. Check the assumptions. 4. Now run the regression of y on sqrt(x). 5. Check the assumption of the model of question 5