[Solved] Practice Exam 1 1. (19 total points) PROB

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 13, 2024

Practice Exam 1 1. (19 total points) PROBABILITY CALCULATIONS Let X be a continuous random variable with probability density function f (x) = 3x2 ,

Practice Exam 1 1. (19 total points) PROBABILITY CALCULATIONS Let X be a continuous random variable with probability density function f (x) = 3x2 , for 0 x 1 . 2.0 1.0 0.0 f(x) = 3x2 3.0 Here is a graph of the probability density function f (x) : 0.0 0.2 0.4 0.6 0.8 1.0 x (a) (4 points) Determine the probability that X is 0.7 : P (X 0.7). Please (i) indicate on the graph the probability that you are calculating and (ii) show your work below for full credit for the calculation. (b) (4 points) Show that the mean of X is E(X) = 3/4 = 0.75. 2 Recall we just showed that the mean of X is = E(X) = 3/4 = 0.75. The variance of X is 2 = Var(X) = E[(X )2 ] = 3/80 = 0.0375. (c) (4 points) For random samples of size n = 49 from this probability distribution, calculate the mean and standard deviation of the sampling distribution of the sample mean (X). (d) (4 points) Approximate the probability that a sample mean for a random sample of size n = 49 from this probability distribution will be less than or equal to 0.7: P (X 0.7). Justify any assumptions you need to make. (e) (3 points) Which probability is smaller, the one in part (a) or the one in part (d)? Circle one choice and explain why this makes sense. NOTE: It is possible to answer this question even if you were unable to calculate a probability in part (a) or part (d). 3 2. (11 TOTAL POINTS) ESTIMATING A MEAN FROM A SMALL SAMPLE A friend hears that you are taking Stat 234 and asks for help with a chemistry lab report. She has made four (n = 4) independent measurements of the specific gravity of a compound. The results are 3.82 3.93 3.67 3.78 The chemistry lab manual says that repeated measurements will vary according to a normal distribution. The mean () of the distribution of measurements of this type is the true specific gravity of the compound. The following output from R may be helpful for working with this data: > gravity [1] 3.82 3.93 3.67 3.78 > mean(gravity) [1] 3.80 > sd(gravity) [1] 0.1074 (a) (4 points) The chemistry lab manual asks for a 95% confidence interval for the true specific gravity. Your friend does not know how to do this. Do it for her. (b) (2 points) State the assumption(s) you used to construct the confidence interval in (a). 4 (c) (3 points) Explain to your friend what \"95% confidence\" means in simple language. Do not use the word \"confidence\" in your answer. (d) (2 points) Circle your choice for each pair of bracketed words below in order to make the following statement true. A 99% confidence interval for the true specific gravity will be [ wider / narrower ] than a 95% confidence interval and the 99% confidence interval [ will / will not ] include the value 3.90 inside the interval. 5 3. (22 TOTAL POINTS) During a weight loss study each of 9 subjects was given either the active drug m-chlorophenylpiperazine (mCPP) for two weeks and then a placebo for another two weeks, or else was given the placebo for the first two weeks and then mCPP for the second two weeks. As part of the study the subjects were asked to rate how hungry they were at the end of each two-week period. The hunger rating data are shown in the table below. Hunger Rating Variable mCPP x Placebo y 1 79 78 2 3 4 48 52 15 54 142 25 Subject 5 6 7 8 61 107 77 54 101 99 94 107 9 5 64 At first glance, this may look like a situation with two samples, but in fact there is just one sample here. The observational units are individual persons and there are n = 9 of them in the sample. For each unit (person), two variables are collected: the hunger rating after the mCPP period (x) and the hunger rating after the placebo period (y). A proper analysis will consider the difference in the hunger rating as the response from this sample of n = 9 subjects: Subject Description Variable 1 2 3 4 5 6 7 8 9 Difference = mCPP Placebo d = (x y) 1 6 90 10 40 8 17 53 59 The null hypothesis is H0 : d = 0 (no difference in hunger rating) d0 . and the proper test statistic is tcorrect = sd / n A researcher (who has not taken a statistics class) incorrectly decides that since she has two lists of numbers, she should use a two-sample t-procedure. She proposes the test statistic twrong = (x y) 0 p , sp 1/n + 1/n where sp is the pooled sample variance. The summary output below will help with any needed calculations: Variable sample size mCPP x n=9 x = 55.33 sx = 31.54 Placebo y n=9 y = 84.89 sy = 34.13 s2y = 1164.61 Description Difference d = (x y) sample sample mean std deviation n = 9 d = 29.56 sd = 32.82 sample variance s2x = 994.75 s2d = 1077.28 The sample sizes are small, but normal probability plots (not shown) of the variables x, y and d show that all three are consistent with data coming from a normal population. 6 (a) (4 points) Determine the denominators for the two t-statistics. Show your work. For the correct statistic: sd / n = For the wrong statistic: sp p 1/n + 1/n = (b) (3 points) In part (a), you should have found that the denominator for the wrong statistic is much larger than the denominator for the correct statistic. Does this makes sense? What quantity does this difference represent? Explain your reasoning. 7 (c) (4 points) Using the correct statistic, show that we would reject the null hypothesis when testing H0 : d = 0 vs. HA : d 6= 0 using a significance level of = 0.05. d0 = tcorrect = sd / n (d) (4 points) Using the wrong statistic, show that we would fail to reject the null hypothesis when testing H0 : x y = 0 vs. HA : x y 6= 0 using a significance level of = 0.05. twrong = (x y) 0 p = sp 1/n + 1/n 8 (e) (3 points) Given the calculations/comparisons above, what do you think is the problem (to weight loss research) of using the incorrect method? Explain. (f) (4 points) Of course, a researcher could simply allocate the subjects into two separate groups treatment and control. Subjects in the treatment group take mCPP and those in the control group take placebo. Then, the two-sample t-statistic would indeed be valid. Give two reasons why the original \"paired\" design is better than a two-group design. One suggestion should be supported by calculations/comparisons peformed above. The other should be supported by your knowledge of good study design practices. 9 4. (15 Total Points) A HYPOTHESIS TEST FOR A PROPORTION A certain kind of computer chip has a failure rate of p = 15%. Random samples of n = 100 chips are regularly taken from the factory production line for quality control. The goal is to determine if the production process is operating according to historical behavior, or if the computer chips now have a higher failure rate than before. (a) (10 points) In one random sample, K = 21 chips are found to be defective. Is this sample consistent with the history, or has the failure rate changed? Calculate a P -value to support your decision and complete a hypothesis test at the = 0.10 level of significance. Be sure to (1) State the null and alternative hypotheses. (2) State the statistical method/formula you have chosen and why it is valid to use here. (3) Show the work for any calculations. (4) State your conclusion in context. Answers that are disorganized (difficult to follow) will not receive full credit. 10 (b) (5 points) Suppose the factory will reject H0 and adjust the production line if in a random sample of n = 100 chips, 25 or more are found to be defective. What is the power of this test if the true failure rate is 20%? 11 5. (10 TOTAL POINTS) CHI-SQUARED TEST It has been suspected that prolonged use of a cellular telephone increases the chance of developing brain cancer due to the microwave-frequency signal that is transmitted by the cell phone. According to this theory, if a cell phone is repeatedly held near one side of the head, then brain tumors are more likely to develop on that side of the head. To investigate this, a group of patients were studied who had used cell phones for a least six months prior to developing brain tumors. The patients were asked whether they routinely held the cell phone to a certain ear and, if so, which ear. The 88 responses (from those who preferred one side over the other) are shown in the following table. Phone Holding Side Left Right Left 14 28 Brain Tumor Side Right 19 27 Total 33 55 (a) (2 points) Set up the hypotheses to test whether phone holding side and brain tumor side are associated. (b) (8 points) Find the 2 -statistic for testing the hypotheses in part (a). What is the degree of freedom? Is H0 rejected at = 0.05? 12 6. (8 TOTAL POINTS) PROOF When a random sample of size n is drawn with replacement from a Bernoulli population with actual proportion of \"successes\" ( p ), we know that the sample proportion of \"successes\" ( pb ) is a statistic with the following mean and variance: 1 E( pb ) = p and 2 Var( pb ) = p (1 p ) . n (a) (6 points) Use the two facts 1 and 2 above to find E[b p (1 pb )]. [NOTE: Organize your work. For full credit, explain any probability/statistics steps.] (b) (2 points) Suggest a statistic that will be an unbiased estimate of the population variance: p (1 p ). Show that your statistic is indeed unbiased. 13 7. (15 TOTAL POINTS) REGRESSION The following data is the height and weight of 15 women in a town. We use R to fit the regression model WEIGHTi = 0 + 1 HEIGHTi + i , i i.i.d. N (0, 2 ) and yield the output below. Part of the R-output is hidden. > mod1 = lm(WEIGHT ~ HEIGHT) > summary(mod1) Residuals: Min 1Q Median -5.3436 -0.9035 -0.3269 3Q 0.6358 Max 8.2706 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -----12.070 -2.845 0.0138 * HEIGHT 58.525 7.296 ------------Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Subject HEIGHT WEIGHT (meter) (kg) 1 1.47 54.21 2 1.50 53.12 3 1.52 54.48 4 1.55 55.84 5 1.57 52.20 6 1.60 67.57 7 1.63 59.93 8 1.65 61.09 9 1.68 60.11 10 1.70 64.47 11 1.73 66.28 12 1.75 68.10 13 1.78 69.92 14 1.80 72.19 15 1.83 74.46 Mean 1.651 62.265 SD 0.114 7.330 Residual standard error: 3.118 on 13 degrees of freedom Multiple R-squared: 0.8319, Adjusted R-squared: 0.819 F-statistic: 64.35 on 1 and 13 DF, p-value: 2.17e-06 Based on the incomplete R-output above, answer the following questions. (a) (3 points) Find the estimate of the intercept 0 . (b) (2 points) Write down the model to predict women's WEIGHT by their HEIGHT. 14 (c) (2 points) If a woman in this town is 1.75m tall, what is her predicted weight? (d) (4 points) Find a 95% confidence interval for the coefficient 1 of HEIGHT in the regression model. Is it significant that 1 6= 0 at = 0.05 level? (e) (4 points) The coefficient of determination R2 is 0.8319, (which is shown as the Multiple R-squared in the R output). What does this mean? Give TWO meanings of R2 . 15 16 17 18 19