Question

1 Approved Answer

Posted on Oct 14, 2024

Lab 9 Today's lab will explore the sampling distribution of the sample proportion p and construct normal theory confidence intervals (CIs) for the population proportion

Lab 9 Today's lab will explore the sampling distribution of the sample proportion p and construct normal theory confidence intervals (CIs) for the population proportion p. This material is Sections 9.4 and 10.2 of the text. [A - B] We should find in the case (n = 100, p = .30) that the sampling distribution is approximately normal and, by experimentation, that the normal theory confidence interval p 1.96 p (1 p ) / 100 delivers nearly the advertised 95% coverage probability. [C - D] We should find that in another case (n = 100, p = .04) that the sample proportion p does not follow a bell-shaped curve. By experimentation, we should find that the normal theory confidence interval p 1.96 p (1 p ) / 100 does not deliver the advertised 95% coverage probability in this case. A. You are given a Bernoulli population (population of successes and failures) with p = .30. You are to determine the sampling distribution of the sample proportion p based on a random sample of size n = 100. Use Data > Generate Patterned Data > Numeric to enter the integers 0 (first value) through 100 (last value) in steps of 1. Rename c1 as x. Use Data > Formula: enter x/100 in the expression field (add x by doubleclicking on it). Rename the new column (c2) phat. Use Statistics > Probability Distributions > Probability Density Function and select \"A column of values\" from the dropdown menu for the form of input, choosing column x for the input values. We use x as the input column because Binomial Distribution menu deals with counts and not proportions. Select \"Binomial\" distribution from the dropdown menu with 100 as number of trials and 0.30 as event probability. Check \"Store probability density values in a column\" under Output and click OK. Rename the new column (c3) P(phat). The counts and proportions are related to each other through formula: p =x/100. Use Statistics > Probability Distributions > Cumulative Distribution Function with x as an input column and \"Binomial\" distribution selected to calculate the cumulative probability distribution of phat (make sure to check \"Store probability density values in a column\" under Output). Rename the new column (c4) Cum Prob. Use Graphs > Scatterplot with option Single Y Variable \"simple\" selected to plot P(phat) (y-axis) v. phat (xaxis). Click on the graph to select it, and click on the \"+\" sign that appears beside the graph to display the Graph Elements menu and check \"Data Display\" to add a connecting line. Examine the plot. 1. Is the pattern of probabilities approximately bell-shaped? ________________ 2. Determine the mean of the sampling distribution of p , p= ______________ 3. Calculate the standard deviation of the sampling distribution of p , p(1 p) _________ . Keep at least 3 100 decimals in your answer. Use Statistics > Probability Distributions > Cumulative Distribution Function to determine the area under the normal density with mean and standard deviation calculated in questions 2 and 3 above the interval .26 p .34. To do this, leave \"Normal\" distribution selected, enter the values from questions 2 and 3 into mean and standard deviation fields, enter .34 as a value. Leave \"Display a table of cumulative probabilities\" checked. 4. Record the cumulative probability for .34 here_________. Keep 4 decimals. 5. Repeat with .26 as an input constant and record the cumulative probability ____________. Keep 4 decimals. 6. Subtract the cumulative probabilities ____________________. Keep 4 decimals. This gives the normal approximation to P(.26 p .34). Use the cumulative probabilities in column Cum Prob to calculate P(.26 p .34) exactly. (For this discrete random variable, look up cumulative probability at .34 and at .25 in your Minitab worksheet and subtract.) 7. Exact P(.26 p .34)=____________________ Keep 4 decimals. Summarize the results from questions 6 and 7 : Normal Approximation ______ Exact Binomial ____ Find the error of approximation as the absolute value of the difference between the exact value and the approximate value: 8. Error = exact value minus approximate value=________________________________. Keep 4 decimals. 9. Find the relative error as [(error)/(exact value)]*100%=_______________. Enter the numerical value into LON-CAPA without % symbol. B. We are now about to conduct an experiment where 1,000 random samples of size n = 100 are taken from a Bernoulli population with p = .30. For each sample we will compute the sample proportion p and the confidence limits for the normal theory 95% confidence interval estimate of p. We will then count how many of the intervals covered the population parameter p = .30. We generate the counts x for the 1,000 random samples of size n = 100 directly as follows. Use Data > Generate Random Data to get 1,000 rows of data in 1 column generated from the \"Binomial\" distribution with 100 as number of trials and 0.30 as event probability. Rename the new column (c5) xgen1. Use Data > Formula to determine the sample proportions p = x/100; specifically, calculate the expression xgen1 /100 and rename the new column (c6) pgen1. Use Data > Formula to calculate the LCL (lower confidence limit) and the UCL (upper confidence limit) based on sample size n = 100 and the sample proportions p that are stored in Column pgen1. For the LCL's, calculate the expression pgen1 - 1.960*( pgen1*(1 - pgen1)/100)^.5 Remember raising to a power of 0.5 is equivalent to taking a square root. Click OK and rename the new column (c7) LCL1. In a similar manner, calculate the UCL's using expression pgen1+ 1.960*( pgen1*(1 - pgen1)/100)^.5 and rename the new column (c8) UCL1. Copy the values xgen1, pgen1, LCL1 and UCL1 from your first 10 CI's of your Minitab computer worksheet into the columns 2 - 5 of Table 1 below and indicate in column 6 whether the CI covers the population rate p = .30, i.e. whether 0.30 is between LCL1 and UCL1. Use two decimal places for pgen1 and three decimal places for LCL1 and UCL1. Table 1. CI results for p = .30 (n = 100) Normal theory CI For the first row of the Table 1, enter Sample xgen1 pgen1 LCL1 UCL1 Cover .30? the values into LON-CAPA: (Yes or No) 10. xgen1= ________, 1 11. pgen1=_________, 2 12. LCL1=________ 3 13. UCL1=________ 4 5 6 7 14. Out of the first 10 intervals 8 recorded in Table 1, how many cover 9 .30?______________ 10 Use the number lines below to plot your first five intervals, one on each line. Do this by marking the endpoints of the intervals using left parenthesis \"( \" for the lower end and right parenthesis \" )\" for upper end. _____________________________________________________________________________ 0.15 0.20 0.25 0.35 0.40 0.45 0.50 p _____________________________________________________________________________ 0.15 0.20 0.25 0.35 0.40 0.45 0.50 p _____________________________________________________________________________ 0.15 0.20 0.25 0.35 0.40 0.45 0.50 p _____________________________________________________________________________ 0.15 0.20 0.25 0.35 0.40 0.45 0.50 p _____________________________________________________________________________ 0.15 0.20 0.25 0.35 0.40 0.45 0.50 p sample1 sample 2 sample 3 sample 4 sample5 Now we will check all 1,000 samples to see in how many cases the CI covers p = .30. Use Data > Formula and the expression sum(LCL1 < .30 and .30 < UCL1) to determine the number of CI's that covered p. Rename the new column Coverage1. Express the result as a percentage out of 1,000. For example, 945 is 94.5%. 15. Coverage=_______________ For LON-CAPA submission, enter a number between 0 and 100 without % symbol. C. Now suppose that you are working with a Bernoulli population with p = .04. You are to determine the sampling distribution of the sample proportion p based on a random sample of size n = 100. Use Statistics > Probability Distributions > Probability Density Function with x as an input column, \"Binomial\" distribution, 100 as number of trials and 0.04 as event probability, and check \"Store probability density values in a column\" to calculate the probability distribution of p . Rename the column NewP(phat). Make sure you use .04, not .4, as the value for event probability. Use Graphs > Scatterplot, choose option Single Y Variable 'Simple to plot NewP(phat) (y-axis) v. phat (xaxis). Select the graph and click on the \"+\" sign to display the Graph Elements menu. Click on \"Data Display\" to add a connecting line and examine the plot. 16. Is the pattern of probabilities bell-shaped or skewed to the right? ________________ D. We are now about to conduct an experiment where 1,000 random samples of size n = 100 are taken from a Bernoulli population with p = .04. For each sample we will compute the sample proportion p and the confidence limits for the normal theory 95% confidence interval estimate of p. We will then count how many of the intervals covered the population parameter p = .04. We generate the counts x for the 1,000 random samples of size n = 100 directly as follows. Use Data > Generate Random Data with \"Binomial\" distribution selected to get 1,000 x's (stored in 1 column, 1000 rows) generated from the binomial distribution with 100 trials and event probability of 0.04 and rename the new column xgen2. Use Data > Formula to determine the sample proportions p = x/100; specifically, calculate the expression xgen2/100 and rename the new column pgen2. Use Data > Formula to calculate the LCL and the UCL based on sample size n = 100 and the sample proportions p that are stored in Column pgen2. For LCL, calculate the expression pgen2 - 1.960*( pgen2*(1 - pgen2)/100)^.5 and rename the new column LCL2. In a similar manner, calculate the UCL's using expression pgen2 + 1.960*( pgen2*(1 - pgen2)/100)^.5 and rename UCL2. Copy the values xgen2, pgen2, LCL2 and UCL2 from your first 10 CI's of your MTB computer worksheet into columns 2 - 5 of Table 2 below and indicate in column 6 whether the CI covers the population rate p = .04. (Use two decimal places for pgen2 and three decimal places for LCL2 and UCL2.) Table 2. CI results for p = .04 (n = 100) Normal theory CI Sample xgen2 pgen2 LCL2 UCL2 Cover .04? (Yes or No) 1 2 3 4 5 6 7 8 9 10 For the first row of Table 2, enter the values as responses to questions 17 and 18: 17. xgen2=______________ 18. pgen2=______________ 19. Out of the first 10 intervals recorded in Table 2, how many cover .04?___ Now we will check all 1,000 samples to see in how many cases the CI covers p = .04. Use Data > Formula and the statement sum(LCL2 < .04 and .04 < UCL2) to determine the number of CI's that covered p. Rename the resulting column Coverage2. Express the result as a percentage. 20. Coverage=____________ For LON-CAPA submission, enter a number between 0 and 100 without % symbol. E. Summarize your findings below in Table 3. Table 3. Summary of coverage results for 95% CI's for p calculated from 1,000 random samples of size n= 100 Observed coverage % Sample size n Parameter value p by normal theory CI 100 .30 100 .04 Does the CI deliver close to 95% coverage rate for p=0.30, while the coverage rate for p=0.04 is quite a bit lower than the advertised 95%?____________ Conclusion: Do not use the normal theory confidence interval for estimating p if you suspect p is very small, that is, you are sampling for a rare attribute (unless n is quite large). The normal theory confidence interval is equally untrustworthy if p is near one. The practical rule of when normal theory confidence interval for p can be used is: both n p and n(1- p ) should be at least 10 (some authors say that these quantities need to be at least 5, see p. 385 of the text). Other methods of constructing confidence interval for the population proportion are available but are outside of the scope of this introductory course