Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Homework outline This homework is designed to give you practice with calculating error bars (confidence intervals) with ddply and using ggplot2 graphics to produce insightful

Homework outline

This homework is designed to give you practice with calculating error bars (confidence intervals) with ddply and using ggplot2 graphics to produce insightful plots of the results.

library(plyr) library(dplyr) library(ggplot2) 

You will continue using theadultdata set that you first encountered on Homework 3. This data set is loaded below.

adult.data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header=FALSE, fill=FALSE, strip.white=T, col.names=c("age", "type_employer", "fnlwgt", "education", "education_num","marital", "occupation", "relationship", "race","sex", "capital_gain", "capital_loss", "hr_per_week","country", "income")) adult.data <- mutate(adult.data, high.income = as.numeric(income == ">50K")) 

Problem 3: Two-sample t-test error bars.

(a) [3 points] Usingddplyand 2-sample t-testing, construct a table that shows the difference in the proportion of men and women earning above 50K across different employer types. E.g., if 20% of men and 15% of women in a group earn about 50K, the difference in proportion is 0.2 - 0.15 = 0.05. Your table should use the 2-sample t-test to also calculate the lower and upper endpoints of a 95% confidence interval. (While a t-test isn't appropriate for binary data when the number of observations is small, we'll ignore this issue for now.) Your table should look something like:

 type_employer prop.diff lower upper 1 ? 0.07743971 0.0504165 0.1044629 2 Federal-gov 0.31059432 0.2532462 0.3679424 3 Local-gov 0.18361338 0.1461258 0.2211009 ... # Edit me 

(b) Your table will have some fields that have the value NaN for the error bar limits. Explain why this is happening.

Your answer goes here!

(c) Subset your summary table to include just those rows for which you have valid calculated values of the difference in high earning proportion and the upper and lower confidence intervals. You will find theis.nanfunction useful here.

# Edit me 

Problem 4: Problem 3 (continued)

(a) Using your table from 3(c) construct a bar chart showing employer type on the x-axis, and the difference in high earning rates between men and women on the y axis. Usegeom_errorbarto overlay error bars as specified by the confidence interval endpoints you computed. You should tilt your x-axis text to limit overlap of x-axis labels. Set an appropriate y-axis label.

# Edit me 

(b) Reorder your x-axis variable in ascending order of high earning rate gap. You may find it useful to recall thereordercommand from Lecture 7. Display the plot with the re-ordered x-axis variable.

# Edit me 

(c) [2 points] Are there any employer types where women have higher rates of being high earners compared to men? Are there any employer types where the high earning rates appear to not be statistically significantly different between men and women?

Your answer goes here!

(d) Which employer types appear to have the greatest disparity in high earning rates between men and women?

Your answer goes here!

Please help with these questions

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Differential Equations With Boundary-Value Problems

Authors: Dennis G Zill, Ellen Monk, Warren S Wright

8th Edition

1285401298, 9781285401294

More Books

Students also viewed these Mathematics questions

Question

3. To retrieve information from memory.

Answered: 1 week ago

Question

2. Value-oriented information and

Answered: 1 week ago

Question

1. Empirical or factual information,

Answered: 1 week ago