Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

STAT 431/511: HOMEWORK 4 DUE: NOVEMBER 8, BY 5:00 PM (The homework must be turned in during class or in BBB's mailbox on the 4th

STAT 431/511: HOMEWORK 4 DUE: NOVEMBER 8, BY 5:00 PM (The homework must be turned in during class or in BBB's mailbox on the 4th floor of JMHH. Please list the names of your collaborators on the first page of your solutions. Turn in your code.) Problem 1. Suppose we are given two independent random samples of sizes n1 and n2 from Bernoulli populations with parameters p1 and p2 . Let p1 and p2 be the corresponding sample proportions. Consider the problem of testing the hypotheses: H0 : p 1 p2 = 0 versus H1 : p1 p2 6= 0. (a) (1 point) Read Section 9.2.1 from the book, and write down the test statistic for the above hypothesis. (b) Now, suppose a random sample of 220 female and 210 male coee drinkers was selected and interviewed. The result was that 71 women and 58 men indicated a preference for decaeinated coee. (i) (2 points) Should we conclude, at a 5% significance level, that the proportion of female coee drinkers who prefer decaeinated coee diers from the proportion of male coee drinkers who prefer decaeinated coee? (ii) (2 points) Construct a 95% confidence interval for the dierence in proportions between male and female coee drinkers who prefer decaeinated coee. Problem 2. Consider the problem of testing H 0 : 1 = 2 versus H1 : 1 > 2 , when the samples are independently drawn from two normal populations N (1 , 12 ) and N (2 , 22 ), where 12 and 22 are assumed to be known. Let n1 and n2 be the sample sizes and x and y be the sample means, respectively. The -level test for H0 rejects if x y > z . z=p 2 2 1 /n1 + 2 /n2 1 2 HOMEWORK 4 (a) (4 points) Show that the power of the -level test as a function of 1 ! 1 2 (1 2 ) = z + p 2 . 2 1 /n1 + 2 /n2 2 is given by (b) (3 points) For detecting a specified dierence 1 2 = > 0 show that for a fixed total sample size n1 + n2 = N , the power is maximized when n1 = 1 1+ N 2 n2 = 2 1+ N. 2 (Here we are ignoring integer restrictions on n1 and n2 ). (c) (3 points) Show that the smallest integer total sample size required to guarantee power at least 1 , when 1 2 = > 0, is given by 2 (z + z )( 1 + 2 ) N= . Problem 3. (5 points) To study the eectiveness of a certain commercial liquid protein diet, the Food and Drug Administration sampled nine individuals who were entering a two-week weight loss program. Their weights immediately before and 6 months after completing the program are recorded below: Person Weight before Weight after 1 197 185 2 212 220 3 188 180 4 226 217 5 170 185 6 194 197 7 233 219 8 166 170 9 205 202 Based on the data, would you conclude that the diet is eective for weight loss? Use a 5% significance level. What assumptions have you made? Problem 4. (5 points) A study was instigated to see if southern California earthquakes of at least moderate size (having values of at least 4.4 on the Richter scale) are more likely to HOMEWORK 4 3 occur on certain days of the week than on others. The following data were obtained for 1100 earthquakes: Day Sun Mon Tues Weds Thurs Fri Sat Number of Earthquakes 156 144 170 158 172 148 152 (a) Test the hypothesis that an earthquake is equally likely to occur on any of the seven days of the week. Use a 5% significance level. (b) What is the p-value of the data? Problem 5. The file payments_full.txt contains corporate payments data from 2010 for a certain division of a West Coast utility company. The first column provides the invoice number and the second column provides the invoice amount (which is negative in a few cases due to credit corrections). In the third, fourth, and fifth columns, we have extracted the first, second, and first two digits of the invoice amounts. The file payments.txt contains a random subsample of 500 invoices from the full dataset. (a) (2 points) According to Benford's law, the first digits should follow a distribution where digit 1 i 9 appears with frequency log10 1 + 1i . Using the payments.txt file, perform a 2 -test to determine whether the data conform to Benford's law. Also provide a bar plot for the frequencies of first digits in the dataset. (Hint: Look up the barplot command in R.) (b) (3 points) An extension of Benford's law states that second digits should follow a P9 1 distribution where digit 0 i 9 appears with frequency j=1 log10 1 + 10j+i . (For instance, the digit 2 should appear as a second digit 1 1 1 log10 1 + + log10 1 + + + log10 1 + 0.109 12 22 92 of the time.) Perform a 2 -test to determine whether the payments.txt data conform to Benford's law for second digits. Also provide a bar plot for the frequencies of second digits in the dataset. (c) (2 points) Now repeat parts (a) and (b) for the full dataset, payments_full.txt. Compare your p-values and interpret the dierences. Does this have anything to do with practical vs. statistical significance? (d) (2 points) In fact, Benford's law states that the frequencies of the first two digits 1 should also follow a logarithmic decay: digits ij occur with frequency log 1 + 10i+j , for 1 i 9 and 0 j 9. Provide a bar plot showing the frequencies of first two 4 HOMEWORK 4 digits in the dataset payments.txt. Do you see any glaring anomalies? (You need not perform a 2 -test for this part.) (e) (1 point) Based on your analysis in parts (a)-(d), would you conclude that the firm fudged its financial data? Problem 6. (a) (3 points) A random sample of 187 voters is chosen, and the voters are asked to evaluate the performance of the first 100 days of the US president. Use the resulting data to test the hypothesis that the evaluation of an individual does not depend on whether that individual is a man or woman. Use a 10% level of significance. Women Men Positive evaluation 54 47 Negative evaluation 20 32 Not sure 23 11 (b) (1 point) Now repeat the exercise in (a), after doubling all the count data. (c) (1 point) Compare the p-values in (a) and (b). If your answers are dierent, provide an intuitive explanation for the dierence. Problem 7. Use the anscombe data in R. Attach this dataset using the attach(anscombe) command. (a) (2 points) Fit a regression model to the data sets: (1) y1 x1, (2) y2 x2, (3) y3 x3, (4) y4 x4 using the command lm. Verify that all the fitted models have the exact same coefficients (up to numerical tolerance). (b) (2 points) Plot the 4 data sets (x1,y1), (x2,y2), (x3,y3), (x4,y4) using the plot command. Add the number of the dataset to each plot as the main title on each plot. Using the command abline, add the regression line to each plot. (c) (2 points) Using the command cor, compute the sample correlation for each data set. What are the SSE, SST and R2 values for each data set? (d) (2 points) Fit the same models in (a) but with the x and y reversed. Using the command summary, does anything about the results stay the same when you reverse x and y? (e) (2 points) Using the command summary, verify that all 4 models have exactly (up to numerical accuracy) the same t-statistics for testing the hypotheses H0 : 0 = 0, and HOMEWORK 4 H0 : 1 = 0. 5

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Real Analysis

Authors: Robert G. Bartle, Donald R. Sherbert

4th edition

471433314, 978-1118135853, 1118135857, 978-1118135860, 1118135865, 978-0471433316

More Books

Students also viewed these Mathematics questions