Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 11, 2024

Brexit poll analysis - Part 1 Directions There are 12 multi-part problems in this comprehensive assessment that review concepts from the entire course. The problems

Brexit poll analysis - Part 1

Directions

There are 12 multi-part problems in this comprehensive assessment that review concepts from the entire course. The problems are split over above coding Make sure you read the instructions carefully and run all pre-exercise code.

For numeric entry problems, you have 10 attempts to input the correct answer. For true/false problems, you have 2 attempts.

If you have questions, visit the "Brexit poll analysis" discussion forum that follows the assessment.

IMPORTANT: Some of these exercises usedslabsdatasets that were added in a July 2019 update. Make sure your package is up to date with the commandinstall.packages("dslabs").

Overview

In June 2016, the United Kingdom (UK) held a referendum to determine whether the country would "Remain" in the European Union (EU) or "Leave" the EU. This referendum is commonly known as Brexit. Although the media and others interpreted poll results as forecasting "Remain" (p>0.5)

, the actual proportion that voted "Remain" was only 48.1%(p=0.481)

and the UK thus voted to leave the EU. Pollsters in the UK were criticized for overestimating support for "Remain".

In this project, you will analyze real Brexit polling data to develop polling models to forecast Brexit results. Youwill write your own code in R and enter the answers on the edX platform.

Important definitions

Data Import

Import thebrexit_pollspolling data from thedslabspackage and set options for the analysis:

# suggested libraries and options library(tidyverse) options(digits = 3) # load brexit_polls object library(dslabs) data(brexit_polls)

Final Brexit parameters

Definep=0.481

as the actual percent voting "Remain" on the Brexit referendum andd=2p1=0.038

as the actual spread of the Brexit referendum with "Remain" defined as the positive outcome:

p <- 0.481 # official proportion voting "Remain" d <- 2*p-1 # official spread

Question 1: Expected value and standard error of a poll

The final proportion of voters choosing "Remain" wasp=0.481. Consider a poll with a sample ofN=1500 voters.

What is the standard error ofd, the spread between the proportion of "Remain" voters and "Leave" voters?

Question 2: Actual Brexit poll estimates

Load and inspect thebrexit_pollsdataset fromdslabs, which contains actual polling data for the 6 months before the Brexit vote. Raw proportions of voters preferring "Remain", "Leave", and "Undecided" are available (remain,leave,undecided) The spread is also available (spread), which is the difference in the raw proportion of voters choosing "Remain" and the raw proportion choosing "Leave".

Calculatex_hatfor each poll, the estimate of the proportion of voters choosing "Remain" on the referendum day (p=0.481

), given the observedspreadand the relationshipd^=2X^1. Usemutate()to add a variablex_hatto thebrexit_pollsobject by filling in the skeleton code below:

 brexit_polls <- brexit_polls %>% mutate(x_hat = __________)

What is the average ofx_hat, the estimates of the parameterp?

What is the standard deviation ofx_hat?

Question 3: Confidence interval of a Brexit poll

Consider the first poll inbrexit_polls, a YouGov poll run on the same day as the Brexit referendum:

 brexit_polls[1,]

Useqnorm()to compute the 95% confidence interval forX^.

This problem set is continued from the above coding . Make sure you have run the following code:

# suggested libraries library(tidyverse) # load brexit_polls object and add x_hat column library(dslabs) data(brexit_polls) brexit_polls <- brexit_polls %>% mutate(x_hat = (spread + 1)/2) # final proportion voting "Remain" p <- 0.481

Question 4: Confidence intervals for polls in June

Create the data framejune_pollscontaining only Brexit polls ending in June 2016 (enddateof "2016-06-01" and later). We will calculate confidence intervals for all polls and determine how many cover the true value ofd

First, usemutate()to calculate a plug-in estimatese_x_hatfor the standard error of the estimateSE^[X]

for each poll given its sample size and value ofX^(x_hat). Second, usemutate()to calculate an estimate for the standard error of the spread for each poll given the value ofse_x_hat. Then, usemutate()to calculate upper and lower bounds for 95% confidence intervals of the spread. Last, add a columnhitthat indicates whether the confidence interval for each poll covers the correct spreadd=0.038

How many polls are injune_polls?

What proportion of polls have a confidence interval that covers the value 0?

What proportion of polls predict "Remain" (confidence interval entirely above 0)?

What proportion of polls have a confidence interval covering the true value ofd?.

Question 5: Hit rate by pollster

Group and summarize thejune_pollsobject by pollster to find the proportion of hits for each pollster and the number of polls per pollster. Usearrange()to sort by hit rate.

Which of the following are TRUE?

Select ALL correct answers.

Unbiased polls and pollsters will theoretically cover the correct value of the spread 50% of the time.

Only one pollster had a 100% success rate in generating confidence intervals that covered the correct value of the spread.

The pollster with the highest number of polls covered the correct value of the spread in their confidence interval over 60% of the time.

All pollsters produced confidence intervals covering the correct spread in at least 1 of their polls.

The results are consistent with a large general bias that affects all pollsters.

Question 6: Boxplot of Brexit polls by poll type

Make a boxplot of the spread injune_pollsby poll type.

Which of the following are TRUE?Select ALL correct answers.

Online polls tend to show support for "Remain" (spread > 0).

Telephone polls tend to show support "Remain" (spread > 0).

Telephone polls tend to show higher support for "Remain" than online polls (higherspread).

Online polls have a larger interquartile range (IQR) for the spread than telephone polls, indicating that they are more variable.

Poll type introduces a bias that affects poll results.

Question 7: Combined spread across poll type

Calculate the confidence intervals of the spread combined across all polls injune_polls, grouping by poll type. Recall that to determine the standard error of the spread, you will need to double the standard error of the estimate.

Use this code (which determines the total sample size per poll type, gives each spread estimate a weight based on the poll's sample size, and adds an estimate of p from the combined spread) to begin your analysis:

 combined_by_type <- june_polls %>% group_by(poll_type) %>% summarize(N = sum(samplesize), spread = sum(spread*samplesize)/N, p_hat = (spread + 1)/2)

What is the lower bound of the 95% confidence interval for online voters?

What is the upper bound of the 95% confidence interval for online voters?

Question 8: Interpreting combined spread estimates across poll type

Interpret the confidence intervals for the combined spreads for each poll type calculated in the previous problem.

Which of the following are TRUE about the confidence intervals of the combined spreads for different poll types?

Select ALL correct answers.

Neither set of combined polls makes a prediction about the outcome of the Brexit referendum (a prediction is possible if a confidence interval does not cover 0).

The confidence interval for online polls is larger than the confidence interval for telephone polls.

The confidence interval for telephone polls is covers more positive values than the confidence interval for online polls.

The confidence intervals for different poll types do not overlap.

Neither confidence interval covers the true value ofd=0.038