Question
Introduction In this lab, we will explore the sampling distribution of the sample proportion (p) and construct normal theory confidence intervals (CIs) for the population
Introduction In this lab, we will explore the sampling distribution of the sample proportion (p) and construct normal theory confidence intervals (CIs) for the population proportion p. This material corresponds to Sections 9.4 and 10.2 of the textbook. Instructions Note that lines beginning with '#' are comments and provide additional details about the code. Part A: Sampling Distribution with p = 0.30 1. Set up your parameters and calculate the sampling distribution. n <- 100 p <- 0.3 x <- seq.int(from=0, to=n, by=1) phat <- x / n P_phat <- dbinom(x, size=n, prob=p) Cum_Prob <- pbinom(x, size=n, prob=p) 2. Plot the sampling distribution. plot(phat, P_phat, type="l", col="blue", xlab="phat", ylab="P_phat", main="Scatterplot of P_phat vs phat") points(phat, P_phat, pch=20, col="red") 1 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.04 0.08 Scatterplot of P_phat vs phat phat P_phat Questions 1-3 1. Is the pattern of probabilities approximately bell-shaped?:_______ 2. Determine the mean of the sampling distribution of p: _______ 3. Calculate the standard deviation of the sampling distribution of p(Keep at least 3 decimals in your answer.):__________ mean_phat <- p sd_phat <- sqrt(p * (1 - p) / n) mean_phat sd_phat Using pnorm and qnorm pnorm: Calculates the cumulative distribution function (CDF) for a normal distribution. It gives the probability that a normal random variable is less than a given value. qnorm: Calculates the quantile function (inverse of CDF) for a normal distribution. It gives the value below which a given percentage of observations fall. Examples 2 # Example of pnorm pnorm(1.96, mean=0, sd=1) ## [1] 0.9750021 # Probability that a standard normal variable is less than 1.96 # Example of qnorm qnorm(0.975, mean=0, sd=1) ## [1] 1.959964 # 97.5th percentile of a standard normal distribution Question 4 Record the cumulative probability for .34 here. Keep 4 decimals. pnorm(0.34, mean=p, sd=sd_phat) Question 5 Repeat with 0.26 as an input constant and record the cumulative probability here. Keep 4 decimals. pnorm(0.26, mean=p, sd=sd_phat) Question 6 Subtract the cumulative probabilities. Keep 4 decimals. approx <- pnorm(0.34, mean=p, sd=sd_phat) - pnorm(0.26, mean=p, sd=sd_phat) approx Question 7 Calculate the exact probability using the binomial distribution: Exact P(0.26 p 0.34) exact <- pbinom(34, size=n, prob=p) - pbinom(26, size=n, prob=p) exact Question 8 Calculate the error of approximation: Error = exact value - approximate value = error <- exact - approx error 3 Question 9 Find the relative error. Enter the number without % symbol.: relative_error <- (error / exact) * 100 relative_error Part B: Confidence Intervals for p = 0.30 Simulate 1,000 samples and calculate CIs. simulation <- 1000 xgen1 <- rbinom(simulation, size=n, prob=p) pgen1 <- xgen1 / n LCL1 <- pgen1 - 1.96 * sqrt(pgen1 * (1 - pgen1) / n) UCL1 <- pgen1 + 1.96 * sqrt(pgen1 * (1 - pgen1) / n) cover_pt_3 <- (p > LCL1) & (p < UCL1) Table_1 <- data.frame(Sample=seq(1, 10), xgen1=xgen1[1:10], pgen1=pgen1[1:10], LCL1=LCL1[1:10], UCL1=UCL1[1:10], Cover_pt_3=cover_pt_3[1:10], row.names=NULL) Table_1 Questions 10-15 10. xgen1 = _______ 11. pgen1 = _______ 12. LCL1 = _______ 13. UCL1 = _______ 14. How many of the first 10 intervals cover 0.30? _______ 15. Calculate the overall coverage percentage. Coverage = (enter a number between 0 and 100 without % symbol) coverage <- sum(cover_pt_3) / simulation * 100 coverage 4 Part C: Sampling Distribution with p = 0.04. 1. Set up your parameters and calculate the sampling distribution. p <- 0.04 x <- seq.int(from=0, to=n, by=1) phat <- x / n P_phat <- dbinom(x, size=n, prob=p) Cum_Prob <- pbinom(x, size=n, prob=p) 2. Plot the sampling distribution. plot(phat, P_phat, type="l", col="blue", xlab="phat", ylab="P_phat", main="Scatterplot of P_phat vs phat") points(phat, P_phat, pch=20, col="red") 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 0.20 Scatterplot of P_phat vs phat phat P_phat Question 16 16. Is the pattern of probabilities bell-shaped or skewed to the right? _______ Part D: Confidence Intervals for p = 0.04 Simulate 1,000 samples and calculate CIs. 5 xgen2 <- rbinom(simulation, size=n, prob=p) pgen2 <- xgen2 / n LCL2 <- pgen2 - 1.96 * sqrt(pgen2 * (1 - pgen2) / n) UCL2 <- pgen2 + 1.96 * sqrt(pgen2 * (1 - pgen2) / n) cover_pt_0_4 <- (p > LCL2) & (p < UCL2) Table_2 <- data.frame(Sample=seq(1, 10), xgen2=xgen2[1:10], pgen2=pgen2[1:10], LCL2=LCL2[1:10], UCL2=UCL2[1:10], Cover_pt_0_4=cover_pt_0_4[1:10], row.names=NULL) Table_2 Questions 17-20 17. xgen2 = _______ 18. pgen2 = _______ 19. How many of the first 10 intervals cover 0.04? _______ 20. Calculate the overall coverage percentage. coverage_0_4 <- sum(cover_pt_0_4) / simulation * 100 coverage_0_4 Summary Summarize your findings in the table below. Table_3 <- data.frame( Sample_size = c(100, 100), Parameter_value = c(0.30, 0.04), Observed_coverage = c(coverage, coverage_0_4) ) Table_3 Does the CI deliver close to 95% coverage rate for p = 0.30, while the coverage rate for p = 0.04 is quite a bit lower than the advertised 95%? _______ Conclusion Do not use the normal theory confidence interval for estimating p if you suspect p is very small, that is, you are sampling for a rare attribute (unless n is quite large). The normal theory confidence interval is equally untrustworthy if p is near one. The practical rule of when a normal theory confidence interval for p can be used is: both np and n(1 p) should be at least 10. Other methods of constructing confidence intervals for the population proportion are available but are outside the scope of this introductory course.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started