Question

1 Approved Answer

Posted on Jun 23, 2024

1. R data analysis competition | Difficult]: Load the attached two data files. The first data in Pew Recarch Center polla taken during the 2005

1. R data analysis competition | Difficult]: Load the attached two data files. The first data in Pew Recarch Center polla taken during the 2005 election campaign [Q1Datalew) and the second data (Q1Data2.cry) is about 20108 election moult in the US |GO points] (n) Take the first data file (Q1Datalow). 1) Subset the data so that you have all states but Hawaii, Abuika, and Washington D.C and have only four columns "state," "marital," "hear2," and "board." 2) If no data is available in "beat?," replace na for the corresponding value in "heard." If neither of hear?' and "hear' has data, ere: the corresponding row. 3) Submit the data so that you only have adem/lean dem" and "rep/lean rep" in the "heat?" column. 4) Change the label of all the variables but 'married' (married people) in the "marital" column to "other' (which indicates non-married people). (b) For each state, calculate 1) the proportion of the democratic supporters, 2) the proportion of the married people, 3) the ratio of the married people among the democratic supporters to the total married people, 4) the ratio of non-married among the democratic to the total non-married people, $) the difference of 3) and 4). Multiply all value by 100 to convert to percentage. Show the first & observations of these new variables (c) Take the second data file (Q1Damn2.newv). Subset the dara so that 1) you have all but three states, Hawaii, Alaska, and Washington D.C, and ? only two columns "state," and "vote_Obama_per" (Obama's actual vote share). Show the first & lines of the dain met. (d) Use a logistic regression predicting vote intention given state, wing the indicator for being married as a predictor. Set up a proper link function. Try three different wwumptions as to the state level heterogeneity . Assumption 1: No state-level heterogencity. All states have the same intercept and alope:. * Assumption 2: Complete state level heterogeneity All states have completely independent intercepts and slope. No outlying coefficient is penalised. * Amsumption 3: State level heterogencity is unknown a priori. States have partially pooled intercepts and slopes Outlying coulficients are penalized (e) Using the erimation moult from the model with Assumption 3, plot your inference for the predicted vote share by state, along with the actual vote intention, plottingthem va. Obama's actual vote share. Annotate each dot with the commponding (f) The marriage gap is defined as the difference of Obama's vote share among married and non-married people ('other"). Figure out how to infer this marriage gap from your model. Using the eatimation result from the model with Assumption 3, plot your inference for the marriage gap, along with the raw marriage gaps from the data, plotting them va Obama's vote share. () Repent (e)-(D) for the model with Assumption 1, Discuss your result