Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

Directions Make sure your package is up to date with the commandinstall.packages(dslabs). Overview In June 2016, the United Kingdom (UK) held a referendum to determine

Directions

Make sure your package is up to date with the commandinstall.packages("dslabs").

Overview

In June 2016, the United Kingdom (UK) held a referendum to determine whether the country would "Remain" in the European Union (EU) or "Leave" the EU. This referendum is commonly known as Brexit. Although the media and others interpreted poll results as forecasting "Remain" (p>0.5), the actual proportion that voted "Remain" was only 48.1%(p=0.481)and the UK thus voted to leave the EU. Pollsters in the UK were criticized for overestimating support for "Remain".

Important definitions

Data Import

Import thebrexit_pollspolling data from thedslabspackage and set options for the analysis:

# suggested libraries and options

library(tidyverse)

options(digits = 3)

# load brexit_polls object

library(dslabs)

data(brexit_polls)

Final Brexit parameters

Definep=0.481as the actual percent voting "Remain" on the Brexit referendum andd=2p1=0.038as the actual spread of the Brexit referendum with "Remain" defined as the positive outcome:

p <- 0.481# official proportion voting "Remain"

d <- 2*p-1# official spread

Question 1: Expected value and standard error of a poll

The final proportion of voters choosing "Remain" wasp=0.481. Consider a poll with a sample ofN=1500voters.

What is the expected total number of voters in the sample choosing "Remain"?

incorrect

48

What is the standard error of the total number of voters in the sample choosing "Remain"?

incorrect

0.00037

What is the expected value ofX^, the proportion of "Remain" voters?

incorrect

What is the standard error ofX^, the proportion of "Remain" voters?

incorrect

What is the expected value ofd, the spread between the proportion of "Remain" voters and "Leave" voters?

incorrect

What is the standard error ofd, the spread between the proportion of "Remain" voters and "Leave" voters?

incorrect

Review

Question 2: Actual Brexit poll estimates

Load and inspect thebrexit_pollsdataset fromdslabs, which contains actual polling data for the 6 months before the Brexit vote. Raw proportions of voters preferring "Remain", "Leave", and "Undecided" are available (remain,leave,undecided) The spread is also available (spread), which is the difference in the raw proportion of voters choosing "Remain" and the raw proportion choosing "Leave".

Calculatex_hatfor each poll, the estimate of the proportion of voters choosing "Remain" on the referendum day (p=0.481), given the observedspreadand the relationshipd^=2X^1. Usemutate()to add a variablex_hatto thebrexit_pollsobject by filling in the skeleton code below:

brexit_polls <- brexit_polls %>%

mutate(x_hat = __________)

What is the average of the observed spreads (spread)?

incorrect

0.038

What is the standard deviation of the observed spreads?

incorrect

0

What is the average ofx_hat, the estimates of the parameterp?

incorrect

What is the standard deviation ofx_hat?

incorrect

Review

Question 3: Confidence interval of a Brexit poll

Consider the first poll inbrexit_polls, a YouGov poll run on the same day as the Brexit referendum:

brexit_polls[1,]

Useqnorm()to compute the 95% confidence interval forX^.

What is the lower bound of the 95% confidence interval?

unanswered

What is the upper bound of the 95% confidence interval?

unanswered

Does the 95% confidence interval predict a winner (does not coverp=0.5)? Does the 95% confidence interval cover the true value ofpobserved during the referendum?

The interval predicts a winner and covers the true value ofp.

The interval predicts a winner but does not cover the true value ofp.

The interval does not predict a winner but does cover the true value ofp.

The interval does not predict a winner and does not cover the true value ofp.

unanswered

Submit

Brexit poll analysis - Part 2

This problem set is continued from the previous page. Make sure you have run the following code:

# suggested libraries

library(tidyverse)

# load brexit_polls object and add x_hat column

library(dslabs)

data(brexit_polls)

brexit_polls <- brexit_polls %>%

mutate(x_hat = (spread + 1)/2)

# final proportion voting "Remain"

p <- 0.481

Question 4: Confidence intervals for polls in June

Create the data framejune_pollscontaining only Brexit polls ending in June 2016 (enddateof "2016-06-01" and later). We will calculate confidence intervals for all polls and determine how many cover the true value ofd.

First, usemutate()to calculate a plug-in estimatese_x_hatfor the standard error of the estimateSE^[X]for each poll given its sample size and value ofX(x_hat). Second, usemutate()to calculate an estimate for the standard error of the spread for each poll given the value ofse_x_hat. Then, usemutate()to calculate upper and lower bounds for 95% confidence intervals of the spread. Last, add a columnhitthat indicates whether the confidence interval for each poll covers the correct spreadd=0.038.

How many polls are in

june_polls?unanswered

What proportion of polls have a confidence interval that covers the value 0?

unanswered

What proportion of polls predict "Remain" (confidence interval entirely above 0)?

unanswered

What proportion of polls have a confidence interval covering the true value of

d?

unanswered

Submit

Question 5: Hit rate by pollster

Group and summarize thejune_pollsobject by pollster to find the proportion of hits for each pollster and the number of polls per pollster. Usearrange()to sort by hit rate.

Which of the following are TRUE?

Select ALL correct answers.

Unbiased polls and pollsters will theoretically cover the correct value of the spread 50% of the time.

Only one pollster had a 100% success rate in generating confidence intervals that covered the correct value of the spread.

The pollster with the highest number of polls covered the correct value of the spread in their confidence interval over 60% of the time.

All pollsters produced confidence intervals covering the correct spread in at least 1 of their polls.

The results are consistent with a large general bias that affects all pollsters.

Question 6: Boxplot of Brexit polls by poll type

Make a boxplot of the spread injune_pollsby poll type.

Which of the following are TRUE?

Select ALL correct answers.

Online polls tend to show support for "Remain" (spread > 0).

Telephone polls tend to show support "Remain" (spread > 0).

Telephone polls tend to show higher support for "Remain" than online polls (higherspread).

Online polls have a larger interquartile range (IQR) for the spread than telephone polls, indicating that they are more variable.

Poll type introduces a bias that affects poll results.

Question 7: Combined spread across poll type

Calculate the confidence intervals of the spread combined across all polls injune_polls, grouping by poll type. Recall that to determine the standard error of the spread, you will need to double the standard error of the estimate.

Use this code (which determines the total sample size per poll type, gives each spread estimate a weight based on the poll's sample size, and adds an estimate of p from the combined spread) to begin your analysis:

combined_by_type <- june_polls %>%

group_by(poll_type) %>%

summarize(N = sum(samplesize),

spread = sum(spread*samplesize)/N,

p_hat = (spread + 1)/2)

What is the lower bound of the 95% confidence interval for online voters?

unanswered

What is the upper bound of the 95% confidence interval for online voters?

unanswered

Submit

Question 8: Interpreting combined spread estimates across poll type

Interpret the confidence intervals for the combined spreads for each poll type calculated in the previous problem.

Which of the following are TRUE about the confidence intervals of the combined spreads for different poll types?

Select ALL correct answers.

Neither set of combined polls makes a prediction about the outcome of the Brexit referendum (a prediction is possible if a confidence interval does not cover 0).

The confidence interval for online polls is larger than the confidence interval for telephone polls.

The confidence interval for telephone polls is covers more positive values than the confidence interval for online polls.

The confidence intervals for different poll types do not overlap.

Neither confidence interval covers the true value ofd=0.038.

unansweredSubmit

Brexit poll analysis - Part 3

This problem set is continued from the previous page. Make sure you have run the following code:

# suggested libraries

library(tidyverse)

# load brexit_polls object and add x_hat column

library(dslabs)

data(brexit_polls)

brexit_polls <- brexit_polls %>%

mutate(x_hat = (spread + 1)/2)

# final proportion voting "Remain"

p <- 0.481

Question 9: Chi-squared p-value

Definebrexit_hit, with the following code, which computes the confidence intervals for all Brexit polls in 2016 and then calculates whether the confidence interval covers the actual value of the spreadd=0.038:

brexit_hit <- brexit_polls %>%

mutate(p_hat = (spread + 1)/2,

se_spread = 2*sqrt(p_hat*(1-p_hat)/samplesize),

spread_lower = spread - qnorm(.975)*se_spread,

spread_upper = spread + qnorm(.975)*se_spread,

hit = spread_lower < -0.038 & spread_upper > -0.038) %>%

select(poll_type, hit)

Usebrexit_hitto make a two-by-two table of poll type and hit status. Then use thechisq.test()function to perform a chi-squared test to determine whether the difference in hit rate is significant.

What is the p-value of the chi-squared test comparing the hit rate of online and telephone polls?

unanswered

Determine which poll type has a higher probability of producing a confidence interval that covers the correct value of the spread. Also determine whether this difference is statistically significant at a p-value cutoff of 0.05. Which of the following is true?

Online polls are more likely to cover the correct value of the spread and this difference is statistically significant.

Online polls are more likely to cover the correct value of the spread, but this difference is not statistically significant.

Telephone polls are more likely to cover the correct value of the spread and this difference is statistically significant.

Telephone polls are more likely to cover the correct value of the spread, but this difference is not statistically significant.

unansweredSubmit

Question 10: Odds ratio of online and telephone poll hit rate

Use the two-by-two table constructed in the previous exercise to calculate the odds ratio between the hit rate of online and telephone polls to determine the magnitude of the difference in performance between the poll types.

Calculate the odds that an online poll generates a confidence interval that covers the actual value of the spread.

unanswered

Calculate the odds that a telephone poll generates a confidence interval that covers the actual value of the spread.

unanswered

Calculate the odds ratio to determine how many times larger the odds are for online polls to hit versus telephone polls.

unanswered

Submit

Question 11: Plotting spread over time

Usebrexit_pollsto make a plot of the spread (spread) over time (enddate) colored by poll type (poll_type). Usegeom_smooth()withmethod = "loess"to plot smooth curves with a span of 0.4. Include the individual data points colored by poll type. Add a horizontal line indicating the final value ofd=.038.

Which of the following plots is correct?

unansweredSubmit

Question 12: Plotting raw percentages over time

Use the following code to create the objectbrexit_long, which has a columnvotecontaining the three possible votes on a Brexit poll ("remain", "leave", "undecided") and a columnproportioncontaining the raw proportion choosing that vote option on the given poll:

brexit_long <- brexit_polls %>%

gather(vote, proportion, "remain":"undecided") %>%

mutate(vote = factor(vote))

Make a graph of proportion over time colored by vote. Add a smooth trendline withgeom_smooth()andmethod = "loess"with a span of 0.3.

Which of the following are TRUE?

Select ALL correct answers.

The percentage of undecided voters declines over time but is still around 10% throughout June.

Over most of the date range, the confidence bands for "Leave" and "Remain" overlap.

Over most of the date range, the confidence bands for "Leave" and "Remain" are below 50%.

In the first half of June, "Leave" was polling higher than "Remain", although this difference was within the confidence intervals.

At the time of the election in late June, the percentage voting "Leave" is trending upwards.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Elements Of Chemical Reaction Engineering

Authors: H. Fogler

6th Edition

013548622X, 978-0135486221

Students also viewed these Mathematics questions