Question
Directions Make sure your package is up to date with the commandinstall.packages(dslabs). Overview In June 2016, the United Kingdom (UK) held a referendum to determine
Directions
Make sure your package is up to date with the commandinstall.packages("dslabs").
Overview
In June 2016, the United Kingdom (UK) held a referendum to determine whether the country would "Remain" in the European Union (EU) or "Leave" the EU. This referendum is commonly known as Brexit. Although the media and others interpreted poll results as forecasting "Remain" (p>0.5), the actual proportion that voted "Remain" was only 48.1%(p=0.481)and the UK thus voted to leave the EU. Pollsters in the UK were criticized for overestimating support for "Remain".
Important definitions
Data Import
Import thebrexit_pollspolling data from thedslabspackage and set options for the analysis:
# suggested libraries and options
library(tidyverse)
options(digits = 3)
# load brexit_polls object
library(dslabs)
data(brexit_polls)
Final Brexit parameters
Definep=0.481as the actual percent voting "Remain" on the Brexit referendum andd=2p1=0.038as the actual spread of the Brexit referendum with "Remain" defined as the positive outcome:
p <- 0.481# official proportion voting "Remain"
d <- 2*p-1# official spread
Question 1: Expected value and standard error of a poll
The final proportion of voters choosing "Remain" wasp=0.481. Consider a poll with a sample ofN=1500voters.
What is the expected total number of voters in the sample choosing "Remain"?
incorrect
48
What is the standard error of the total number of voters in the sample choosing "Remain"?
incorrect
0.00037
What is the expected value ofX^, the proportion of "Remain" voters?
incorrect
What is the standard error ofX^, the proportion of "Remain" voters?
incorrect
What is the expected value ofd, the spread between the proportion of "Remain" voters and "Leave" voters?
incorrect
What is the standard error ofd, the spread between the proportion of "Remain" voters and "Leave" voters?
incorrect
Review
Question 2: Actual Brexit poll estimates
Load and inspect thebrexit_pollsdataset fromdslabs, which contains actual polling data for the 6 months before the Brexit vote. Raw proportions of voters preferring "Remain", "Leave", and "Undecided" are available (remain,leave,undecided) The spread is also available (spread), which is the difference in the raw proportion of voters choosing "Remain" and the raw proportion choosing "Leave".
Calculatex_hatfor each poll, the estimate of the proportion of voters choosing "Remain" on the referendum day (p=0.481), given the observedspreadand the relationshipd^=2X^1. Usemutate()to add a variablex_hatto thebrexit_pollsobject by filling in the skeleton code below:
brexit_polls <- brexit_polls %>%
mutate(x_hat = __________)
What is the average of the observed spreads (spread)?
incorrect
0.038
What is the standard deviation of the observed spreads?
incorrect
0
What is the average ofx_hat, the estimates of the parameterp?
incorrect
What is the standard deviation ofx_hat?
incorrect
Review
Question 3: Confidence interval of a Brexit poll
Consider the first poll inbrexit_polls, a YouGov poll run on the same day as the Brexit referendum:
brexit_polls[1,]
Useqnorm()to compute the 95% confidence interval forX^.
What is the lower bound of the 95% confidence interval?
unanswered
What is the upper bound of the 95% confidence interval?
unanswered
Does the 95% confidence interval predict a winner (does not coverp=0.5)? Does the 95% confidence interval cover the true value ofpobserved during the referendum?
The interval predicts a winner and covers the true value ofp.
The interval predicts a winner but does not cover the true value ofp.
The interval does not predict a winner but does cover the true value ofp.
The interval does not predict a winner and does not cover the true value ofp.
unanswered
Submit
Brexit poll analysis - Part 2
This problem set is continued from the previous page. Make sure you have run the following code:
# suggested libraries
library(tidyverse)
# load brexit_polls object and add x_hat column
library(dslabs)
data(brexit_polls)
brexit_polls <- brexit_polls %>%
mutate(x_hat = (spread + 1)/2)
# final proportion voting "Remain"
p <- 0.481
Question 4: Confidence intervals for polls in June
Create the data framejune_pollscontaining only Brexit polls ending in June 2016 (enddateof "2016-06-01" and later). We will calculate confidence intervals for all polls and determine how many cover the true value ofd.
First, usemutate()to calculate a plug-in estimatese_x_hatfor the standard error of the estimateSE^[X]for each poll given its sample size and value ofX(x_hat). Second, usemutate()to calculate an estimate for the standard error of the spread for each poll given the value ofse_x_hat. Then, usemutate()to calculate upper and lower bounds for 95% confidence intervals of the spread. Last, add a columnhitthat indicates whether the confidence interval for each poll covers the correct spreadd=0.038.
How many polls are in
june_polls?unanswered
What proportion of polls have a confidence interval that covers the value 0?
unanswered
What proportion of polls predict "Remain" (confidence interval entirely above 0)?
unanswered
What proportion of polls have a confidence interval covering the true value of
d?
unanswered
Submit
Question 5: Hit rate by pollster
Group and summarize thejune_pollsobject by pollster to find the proportion of hits for each pollster and the number of polls per pollster. Usearrange()to sort by hit rate.
Which of the following are TRUE?
Select ALL correct answers.
Unbiased polls and pollsters will theoretically cover the correct value of the spread 50% of the time.
Only one pollster had a 100% success rate in generating confidence intervals that covered the correct value of the spread.
The pollster with the highest number of polls covered the correct value of the spread in their confidence interval over 60% of the time.
All pollsters produced confidence intervals covering the correct spread in at least 1 of their polls.
The results are consistent with a large general bias that affects all pollsters.
Question 6: Boxplot of Brexit polls by poll type
Make a boxplot of the spread injune_pollsby poll type.
Which of the following are TRUE?
Select ALL correct answers.
Online polls tend to show support for "Remain" (spread > 0).
Telephone polls tend to show support "Remain" (spread > 0).
Telephone polls tend to show higher support for "Remain" than online polls (higherspread).
Online polls have a larger interquartile range (IQR) for the spread than telephone polls, indicating that they are more variable.
Poll type introduces a bias that affects poll results.
Question 7: Combined spread across poll type
Calculate the confidence intervals of the spread combined across all polls injune_polls, grouping by poll type. Recall that to determine the standard error of the spread, you will need to double the standard error of the estimate.
Use this code (which determines the total sample size per poll type, gives each spread estimate a weight based on the poll's sample size, and adds an estimate of p from the combined spread) to begin your analysis:
combined_by_type <- june_polls %>%
group_by(poll_type) %>%
summarize(N = sum(samplesize),
spread = sum(spread*samplesize)/N,
p_hat = (spread + 1)/2)
What is the lower bound of the 95% confidence interval for online voters?
unanswered
What is the upper bound of the 95% confidence interval for online voters?
unanswered
Submit
Question 8: Interpreting combined spread estimates across poll type
Interpret the confidence intervals for the combined spreads for each poll type calculated in the previous problem.
Which of the following are TRUE about the confidence intervals of the combined spreads for different poll types?
Select ALL correct answers.
Neither set of combined polls makes a prediction about the outcome of the Brexit referendum (a prediction is possible if a confidence interval does not cover 0).
The confidence interval for online polls is larger than the confidence interval for telephone polls.
The confidence interval for telephone polls is covers more positive values than the confidence interval for online polls.
The confidence intervals for different poll types do not overlap.
Neither confidence interval covers the true value ofd=0.038.
unansweredSubmit
Brexit poll analysis - Part 3
This problem set is continued from the previous page. Make sure you have run the following code:
# suggested libraries
library(tidyverse)
# load brexit_polls object and add x_hat column
library(dslabs)
data(brexit_polls)
brexit_polls <- brexit_polls %>%
mutate(x_hat = (spread + 1)/2)
# final proportion voting "Remain"
p <- 0.481
Question 9: Chi-squared p-value
Definebrexit_hit, with the following code, which computes the confidence intervals for all Brexit polls in 2016 and then calculates whether the confidence interval covers the actual value of the spreadd=0.038:
brexit_hit <- brexit_polls %>%
mutate(p_hat = (spread + 1)/2,
se_spread = 2*sqrt(p_hat*(1-p_hat)/samplesize),
spread_lower = spread - qnorm(.975)*se_spread,
spread_upper = spread + qnorm(.975)*se_spread,
hit = spread_lower < -0.038 & spread_upper > -0.038) %>%
select(poll_type, hit)
Usebrexit_hitto make a two-by-two table of poll type and hit status. Then use thechisq.test()function to perform a chi-squared test to determine whether the difference in hit rate is significant.
What is the p-value of the chi-squared test comparing the hit rate of online and telephone polls?
unanswered
Determine which poll type has a higher probability of producing a confidence interval that covers the correct value of the spread. Also determine whether this difference is statistically significant at a p-value cutoff of 0.05. Which of the following is true?
Online polls are more likely to cover the correct value of the spread and this difference is statistically significant.
Online polls are more likely to cover the correct value of the spread, but this difference is not statistically significant.
Telephone polls are more likely to cover the correct value of the spread and this difference is statistically significant.
Telephone polls are more likely to cover the correct value of the spread, but this difference is not statistically significant.
unansweredSubmit
Question 10: Odds ratio of online and telephone poll hit rate
Use the two-by-two table constructed in the previous exercise to calculate the odds ratio between the hit rate of online and telephone polls to determine the magnitude of the difference in performance between the poll types.
Calculate the odds that an online poll generates a confidence interval that covers the actual value of the spread.
unanswered
Calculate the odds that a telephone poll generates a confidence interval that covers the actual value of the spread.
unanswered
Calculate the odds ratio to determine how many times larger the odds are for online polls to hit versus telephone polls.
unanswered
Submit
Question 11: Plotting spread over time
Usebrexit_pollsto make a plot of the spread (spread) over time (enddate) colored by poll type (poll_type). Usegeom_smooth()withmethod = "loess"to plot smooth curves with a span of 0.4. Include the individual data points colored by poll type. Add a horizontal line indicating the final value ofd=.038.
Which of the following plots is correct?
unansweredSubmit
Question 12: Plotting raw percentages over time
Use the following code to create the objectbrexit_long, which has a columnvotecontaining the three possible votes on a Brexit poll ("remain", "leave", "undecided") and a columnproportioncontaining the raw proportion choosing that vote option on the given poll:
brexit_long <- brexit_polls %>%
gather(vote, proportion, "remain":"undecided") %>%
mutate(vote = factor(vote))
Make a graph of proportion over time colored by vote. Add a smooth trendline withgeom_smooth()andmethod = "loess"with a span of 0.3.
Which of the following are TRUE?
Select ALL correct answers.
The percentage of undecided voters declines over time but is still around 10% throughout June.
Over most of the date range, the confidence bands for "Leave" and "Remain" overlap.
Over most of the date range, the confidence bands for "Leave" and "Remain" are below 50%.
In the first half of June, "Leave" was polling higher than "Remain", although this difference was within the confidence intervals.
At the time of the election in late June, the percentage voting "Leave" is trending upwards.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started