Question
First load the necessary packages: ```{r} library(ggplot2) library(dplyr) library(forcats) library(moderndive) 1. Ask two of your classmates what their estimate of $hat{p}$ was. How do the
First load the necessary packages:
```{r} library(ggplot2) library(dplyr) library(forcats) library(moderndive)
1. Ask two of your classmates what their estimate of $\hat{p}$ was. How do the $\hat{p}$ estimates from different samples compare?
+
2. **Why** did everyone get a different estimate?
+
***
## Estimating $\widehat{SE}$ from a Single Sample
Typically we only have the opportunity to collect **one sample** for our study. Consequently, we have to use the amount of variability in our **single sample** as an estimate of the amount of variability we might expect in our results if we had taken a random sample of 50 different people. The $\widehat{SE}_{\hat{p}}$ serves as an **ESTIMATE** of **sampling variability** if you only have a **single sample**. The formula for estimating the standard error of $\hat{p}$ is given in Equation \@ref(eq:se).
\begin{equation} \widehat{SE}_{\hat{p}} \approx \sqrt{\frac{\hat{p} \times (1-\hat{p})}{n}} (\#eq:se) \end{equation}
> Note that we use $n$ for the size of the sample, that p "wears a hat", like so: $\hat{p}$ because we are ESTIMATING a proportion based on only a sample, and that the SE "wears a hat" as well because we are ESTIMATING $SE$ based on only a sample.
The standard error of $\hat{p}$ can be estimated in R as follows:
```{r} n50_1rep %>% summarize(divorce_count = sum(marital == "Divorced"), n = n()) %>% mutate(p_hat = divorce_count/ n, se_hat = sqrt(p_hat * (1 - p_hat) / n)) ```
***
# Demo: Generating a Sampling Distribution of $\hat{p}$
If you ran the code chunk that takes a random sample of 50 cases a thousand more times....and wrote down every $\hat{p}$ you got, you would have what is called a simulated "sampling distribution".
> A sampling distribution shows every [or nearly every!] possible result a sampling statistic can have under every [or nearly every!] possible sample **of a given sample size** from a population.
## Simulated Sampling Distribution of $\hat{p}$ for $n = 50$
Instead of running the sampling code chunk for $n = 50$ over and over, we can "collect" 1000 samples of $n = 50$ easily with R. The following code chunk takes 1000 **different** samples of $n = 50$ and stores them in the data frame `n50_1000rep`:
```{r} set.seed(19) n50_1000rep <- gss_14 %>% rep_sample_n(size = 50, reps = 1000) ```
Be sure to look at `n50_rep1000` in the data viewer to get a sense of these 1000 samples look like.
***
3. What is the name of the column that identifies which of the 1000 samples each row is from?
+
4. What is the sample size $n$ for each of the $1000$ samples we took? (i.e. how many humans are sampled in each replicate)?
+
5. Based on your histogram, what appeared to be a very common value of $\hat{p}$? What was a very uncommon value? Specifically, find the 1%, 99%, the mean, and the standard deviation of the values stored in `p_hat_n50` to help answer the question.
```{r} # Your code here ```
+
6. How do these values compare to the estimates we got for $\hat{p}$ and $\widehat{SE}_{\hat{p}}$ for `Divorced` respondents based on your **single** sample of 50 people earlier in this Problem Set?
+
7. Use the `rep_sample_n` function to collect 1000 virtual samples of size $n = 15$. Store the 1000 virtual samples in an object named `n15_1000rep`. Use a seed of 910.
```{r} # Type your code and comments inside the code chunk
```
8. Calculate sample proportion $\hat{p}$ of people who reported they were `Divorced` for each replicate of your $n = 15$ sampling. Store the results in `ques8` and display the first six rows of `ques8`.
```{r} # Type your code and comments inside the code chunk
```
9. Visualize the sampling distribution of $\hat{p}$ from your $n = 15$ sampling with a purple histogram.
```{r} # Type your code and comments inside the code chunk
```
10. Calculate the mean of the $n = 15$ sampling distribution, and the standard error of the $n = 15$ sampling distribution
```{r} # Type your code and comments inside the code chunk
```
***
11. How does the standard error of the $n= 15$ sampling distribution compare to the standard error of the $n = 50$ sampling distribution?
+
12. Explain any observed differences from 11.
+
***
13. Use the `rep_sample_n` function to collect 1000 virtual samples of size $n = 600$. Store the 1000 virtual samples in an object named `n600_1000rep`. Use a seed of 84.
```{r} # Type your code and comments inside the code chunk
```
14. Calculate the proportion $\hat{p}$ of people who reported they were `Divorced`for each replicate of your $n = 600$ sampling. Store the results in `ques14` and display the first six rows of `ques14`.
```{r} # Type your code and comments inside the code chunk
```
15. Calculate the mean of the $n = 600$ sampling distribution, and the standard error of the $n = 600$ sampling distribution.
```{r} # Type your code and comments inside the code chunk
```
16. Was there more **variability** from sample to sample when we took a sample size of 600 or a sample size of 50? **Explain what evidence you have for assessing this**.
+
***
17. Which sampling distribution looked more normally distributed (bell shaped and symmetrical); the one built on n = 15, 50 or 600? **Why?**
+
18. Imagine we collected only a single small sample of 15 respondents as given from the code below.
```{r} set.seed(53) n15_1rep <- gss_14 %>% rep_sample_n(size = 15, reps = 1) # and n50_1rep <- gss_14 %>% rep_sample_n(size = 50, reps = 1)
Following the example from the beginning of the Problem Set (roughly line 138), estimate the **sample proportion** $\hat{p}$ of people who identified as `Divorced` based on `n15_1rep`... AS WELL AS the **standard error of $\hat{p}$**
19. Replace `x` with the standard error you obtained by taking the standard deviation of the $n = 15$ sampling distribution Replace `a` with the standard error you obtained for a single sample of $n = 15$ using the mathematical formula.
20. Based on what you observed for 19, **IF** you collected a single sample from 600 respondents, do you think the standard error will be smaller or larger than the one you calculated for $n = 15$. **Explain your reasoning**.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started