Question

1 Approved Answer

Posted on Oct 19, 2024

Stats 120B Homework 6: Due Thursday, Mar. 3 1. Dene the following random variables: iid X1 , X2 , X3 , X4 N (0, 1)

Stats 120B Homework 6: Due Thursday, Mar. 3 1. Dene the following random variables: iid X1 , X2 , X3 , X4 N (0, 1) W N (5, 9) iid V1 , V2 2 (4) Assume X1 , . . . , X4 , W, V1 , V2 are all independent of each other. Also dene 1 X = (X1 + X2 + X3 + X4 ). 4 Each of the following random variables, Y , has either a normal distribution, 2 distribution, t-distribution, or F -distribution. For each random variable, state the name of the distribution and the value(s) of its parameter(s) and explain. (a) Y = V1 V2 (b) Y = 2 X1 2 X2 (c) Y = (d) Y = W 5 3X1 32X 2 V1 + V2 (e) Y = 4X + 2W + 1 2. For each of the following statements, state whether the statement is true or false, and justify your answer: (a) The signicance level of a hypothesis test is equal to the probability that the null hypothesis is true. (b) If the signicance level of a hypothesis test is decreased, the power would be expected to increase. (c) If a null hypothesis is rejected at the signicance level of , the probability that the null hypothesis is true equals . (d) A type I error occurs when the test statistic falls in the rejection region of the test. 1 (e) If the p-value is 0.03, the corresponding test will fail to reject the null hypothesis at the signicance level of 0.02. (f) If the null hypothesis is true, the probability of a type II error is zero. (g) The p-value of a hypothesis test is the probability that the null hypothesis is correct. (h) If a test rejects the null hypothesis at a signicance level of 0.06, then the p-value is less than or equal to 0.06. iid 3. Assume X1 , . . . , X15 N (, 2 ). We observe a random sample where x = 7.8 and 2 s = 16.0. Consider testing the null hypothesis H0 : = 8. (a) Calculate the observed value of the test statistic for a one sample t-test. (b) Calculate the p-value if Ha : < 8. (c) Calculate the p-value if Ha : = 8. (d) Calculate the p-value if Ha : > 8. 4. Suppose X1 , . . . , X10 form a random sample from a normal distribution with mean and variance 2 . (a) What is the distribution of the pivotal quantity 9s2 U= 2, where s2 = 10 i=1 (Xi X)2 /9 is the sample variance? (b) Suppose we would like to test H0 : 2 = 1 versus Ha : 2 > 1. We will reject H0 for large values of s2 (compared to 1). Find the rejection region corresponding to a signicance level of = 0.05. (c) Consider again the test of H0 : 2 = 1 versus Ha : 2 > 1. Suppose we observe s2 = 2.1. Find the p-value for this test. Using a signicance level of 0.05, what is your conclusion? (d) Given your calculations in parts (b) and (c), suggest a test statistic for the 2 2 test of H0 : 2 = 0 versus Ha : 2 > 0 . That is, give a quantity involving the 2 sample data and the null value 0 for which we know its distribution under the assumption of the null hypothesis. What is the probability distribution of this test statistic under the null hypothesis? 2 For the following questions, attend the discussion section, answer the questions and turn in your R output, including any plots. R codes are posted on the course website. 5. This problem will make use of two data sets which you should load into R using the following commands: Glucose1 = read.table("http://people.reed.edu/~jones/141/Glucose1.dat", header=TRUE) Glucose2 = read.table("http://people.reed.edu/~jones/141/Glucose2.dat", header=TRUE) Both data sets contain data on repeated administrations of glucose tolerance tests to a sample of women who made repeated visits to a Boston City Hospital between 1955 and 1960. Glucose1 has data for each of 53 non-pregnant women, measured yearly. Glucose2 has data for 52 women for each of three pregnancies during the same period. In each data set, the variables test1, test2, etc., refer to the change in blood glucose levels measured rst after fasting, then again one hour after administration of a dose of 100 grams of glucose (in mg/100ml). The glucose tolerance test is used to diagnose diabetes. (a) We are going to compare the test6 results for the non-pregnant women in the Glucose1 data set to the test3 results for the pregnant women in the Glucose2 data set using a 95% condence interval for the dierence in means. i. Create overlaid histograms with smoothed density lines of the two groups. Include a legend on your plot. How do the two samples compare to each other (e.g., center, shape, spread)? ii. Calculate the sample variance for each group using the var function in R. Does the equal variance assumption appear to hold? iii. Create a normal quantile-quantile plot for each sample. Does the normality assumption appear to hold for each sample? iv. Calculate a 95% condence interval for the dierence in mean glucose tolerance test results (pregnantnon-pregnant). Calculate the interval without using the t.test function in R, though you may check your answer using this function. v. Interpret your interval from part (iv). in terms of the problem. 3 vi. Does there appear to be a signicant dierence in the mean glucose tolerance test results between pregnant and non-pregnant women? Use your condence interval to justify your answer. (b) Now, let's compare the glucose tolerance test results between the rst two pregnancies for the pregnant women in the Glucose2 data set (test1 and test2). Note that the two \"samples\" are no longer independent, so our procedure from part (a) won't work. Instead, you will be led through the steps to compute a 95% condence interval for the mean dierence in the change in blood glucose levels for \"paired data.\" i. Create a new variable in R for the dierences between test1 and test2 for each of the 52 pregnant women using the following R command: D = Glucose2$test2 - Glucose2$test1 ii. Calculate the sample mean and standard deviation of the dierences. iii. Create a normal quantile-quantile plot of the dierences. Does the sample appear to come from a normal distribution? iv. Use our formula for a 95% condence interval for a population mean (with unknown population variance) to calculate a 95% condence interval for the mean dierence in glucose tolerance test results between a pregnant woman's rst and second pregnancies. v. Interpret your interval in terms of the problem. vi. Does there appear to be a signicant dierence in the mean glucose tolerance test results between pregnant womens' rst and second pregnancies? Use your condence interval to justify your answer. (c) We are going to compare the test5 results for the non-pregnant women in the Glucose1 data set to the test2 results for the pregnant women in the Glucose2 data set. i. Create a normal quantile-quantile plot for each of the two samples. Does each appear to have an approximate normal distribution? Explain. ii. Regardless of your assessment of normality in part (i), carry out the F-test of: 2 2 2 2 H0 : 1 = 2 versus Ha : 1 = 2 2 where 1 is the true variance in glucose tolerance test results for non2 pregnant women, and 2 is the true variance in glucose tolerance test results for pregnant women. Report the value of the test statistic, the pvalue, and your conclusion using a signicance level of = 0.01. (Perform 4 your calculations without using the built-in R function var.test, though you may use this function to check your answer.) iii. Now conduct a 2-sample t-test of: H0 : 1 = 2 versus Ha : 1 < 2 where 1 is the true mean glucose tolerance test result for non-pregnant women, and 2 is the true mean glucose tolerance test result for pregnant women. Report the value of the test statistic, the p-value, and your conclusion using a signicance level of = 0.01. (Perform your calculations without using the built-in R function t.test, though you may use this function to check your answer.) iv. What three assumptions must hold for your 2-sample t-test in part (iii) to be valid? For each of the three assumptions, assess whether or not the assumption seems to hold for these data. 5