Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Remember to change theauthor:field on this Rmd file to your own name. Loading the neccessary libraries for this assignment: In[]: library(plyr) library(dplyr) library(ggplot2) You will

Remember to change theauthor:field on this Rmd file to your own name.

Loading the neccessary libraries for this assignment:

In[]:

library(plyr) library(dplyr) library(ggplot2) 

You will continue using theadultdata set that you first encountered on Homework 3. This data set is loaded below.

In[]:

adult.data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header=FALSE, fill=FALSE, strip.white=T, col.names=c("age", "type_employer", "fnlwgt", "education", "education_num","marital", "occupation", "relationship", "race","sex", "capital_gain", "capital_loss", "hr_per_week","country", "income")) adult.data <- mutate(adult.data, high.income = as.numeric(income == ">50K")) 

Problem 1: Calculating and plotting error bars for a 1-sample t-test

#### (a) Using `ddply` and 1-sample t-testing, construct a table that shows the average `capital_gain` across `education`, along with the lower and upper endpoints of a 95% confidence interval. Your table should look something like:  education mean lower upper 1 10th 404.5745 91.893307 717.2557 2 11th 215.0979 144.306937 285.8888 3 12th 284.0878 126.824531 441.3510 ... 

In[]:

#Edit here 

#### (b) Reorder the levels of the factor in your summary table to correspond to ascending order of education. E.g., Preschool is the lowest, 1st-4th the next lowest, etc. You may find the `factor(..., levels = ...)` command helpful here. For the post-high school grades, you can use the ordering: Assoc-voc, Assoc-acdm, Some-college, Bachelors, Masters, Prof-school, Doctorate.   

In[]:

#Edit here 

Problem 2: (Continuing from Problem 1)

(a) Using your table from Problem 1(b) Construct a bar chart showing education on the x-axis, and the average capital gainst on the y axis. Usegeom_errorbarto overlay error bars as specified by the confidence interval endpoints you computed. You should tilt your x-axis text to limit overlap of x-axis labels. Set an appropriate y-axis label.

In[]:

#Edit here 

(b) What can you conclude about the association between capital gains and education levels? Does there appear to be a statistically significant difference in capital gains across education?

Replace this text with your answer.

Problem 3: Two-sample t-test error bars.

(a) Usingddplyand 2-sample t-testing, construct a table that shows the difference in the proportion of men and women earning above 50K across different employer types. E.g., if 20% of men and 15% of women in a group earn about 50K, the difference in proportion is 0.2 - 0.15 = 0.05. Your table should use the 2-sample t-test to also calculate the lower and upper endpoints of a 95% confidence interval. (While a t-test isn't appropriate for binary data when the number of observations is small, we'll ignore this issue for now.) Your table should look something like:

 type_employer prop.diff lower upper 1 ? 0.07743971 0.0504165 0.1044629 2 Federal-gov 0.31059432 0.2532462 0.3679424 3 Local-gov 0.18361338 0.1461258 0.2211009 ... 

In[]:

#Edit here 

(b) Your table will have some fields that have the value NaN for the error bar limits. Explain why this is happening.

Replace this text with your answer.

(c) Subset your summary table to include just those rows for which you have valid calculated values of the difference in high earning proportion and the upper and lower confidence intervals. You will find theis.nanfunction useful here.

In[]:

#Edit here 

Problem 4 (Continuing from Problem 3)

(a) Using your table from 3(c) construct a bar chart showing employer type on the x-axis, and the difference in high earning rates between men and women on the y axis. Usegeom_errorbarto overlay error bars as specified by the confidence interval endpoints you computed. You should tilt your x-axis text to limit overlap of x-axis labels. Set an appropriate y-axis label.

In[]:

#Edit here 

(b) Reorder your x-axis variable in ascending order of high earning rate gap. You may find it useful to recall thereordercommand from Lecture 7. Display the plot with the re-ordered x-axis variable.

In[]:

#Edit here 

(c) Are there any employer types where women have higher rates of being high earners compared to men? Are there any employer types where the high earning rates appear to not be statistically significantly different between men and women?

Replace this text with your answer.

(d) Which employer types appear to have the greatest disparity in high earning rates between men and women?

Replace this text with your answer.

Problem 5: Coloring by statistical significance

(a) Repeat part 1(a), this time adding an additional statistical significance indicator column that is 0 if the confidence interval overlaps 0 and is 1 otherwise. Your table should look something like:

 education mean lower upper is.signif 1 10th 404.5745 91.893307 717.2557 1 2 11th 215.0979 144.306937 285.8888 1 3 12th 284.0878 126.824531 441.3510 1 4 1st-4th 125.8750 5.656611 246.0934 1 5 5th-6th 176.0210 74.643760 277.3983 1 6 7th-8th 233.9396 154.388060 313.4912 1 7 9th 342.0895 -44.104225 728.2832 0 ... 

In[]:

#Edit here 

(b) Repeat 1(b) to reorder education to be in ascending order of educational attainment. Then repeat 2(a) to produce a bar chart with error bars, specifying this time that the fill of the bars should be determined by whether the average gains are statistically significantly different from 0.

In[]:

#Edit here 

please Need help for this assignment

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Digital Marketing

Authors: Annmarie Hanlon

1st Edition

1526426676, 9781526426673

Students also viewed these Mathematics questions