Question
# Remember to use / instead of in your file path/name. Make sure character strings are preserved and NOT considered as factors. census
# Remember to use / instead of \ in your file path/name. Make sure character strings are preserved and NOT considered as factors.
census <- read.csv("C:/Users/headl/Downloads/BAN 830 - Data Visualization with R - CPS85.csv", stringsAsFactors = FALSE)
Make an initial inspection of your data by using the functions str(), names(), and head(). str(census) names(census) head(census)
Based on the previous step, what class type is used to describe the "educ" (years of education) variable? integer Install and load the "ggplot2" package to start exploring the "census" data frame through data visualization. install.packages("ggplot2") library("ggplot2")
Using the ggplot function, use a basic bar (column) chart that shows the number of people working in each employment sector. ggplot(data=census, aes(x = sector)) + geom_bar()
Modify a bar (column) chart to a stacked column chart showing the count of males and females workingin each employment sector. This can be accomplished by adding a 'fill = VARIABLE' argument (without the quotation marks) to the aes call, where VARIABLE should be replaced by the proper variable name in the data set.
ggplot(data=census, aes(x=sex)) + geom_bar(aes(fill = sector))
Alter your stacked column chart so that the counts of males and females are next to each other for each employment category instead of in a stacked fashion. This can be accomplished by adding a 'position = "dodge"' argument to the geometry layer.
ggplot(data=census, aes(x=sex)) + geom_bar(position = "dodge")
Based on your visualization in the previous step, which category shows the most extreme disparity between genders? Which gender is more prominent in that sector?
Using the ggplot function, use a scatter plot that shows how wage differs as age increases. Differentiate the male points and female points by color. HINT: This can be done by adding a color argument to the aes call in either main part of your code.
Based on your visualization from the previous step, what do you notice about the total number of workers as age increases? What insight can you provide about wage with regard to sex for workers 50 years of age and older?
Add linear wage trend lines (without bands) for each sex to your scatter plot. Reminder: omit the bands around the trend lines.
Does the linear model support your answer to question in step 11?
Using your scatter plot from the previous step, break your plot into two panels, one each for married and single.
Place the plots in a vertical format.
Based on your visualization from the previous step, what do you notice about the rate of wage increases formales and females based on marriage status?
Use a histogram for the wage variable. Use the color "#CD3107" as the "fill" for the bars on your histogram.
Someone said wages follows a normal distribution. Do you agree with this? Why or why not?
Change the color of the bars in the histogram you created in step 16 to a "Penn State" blue color. Feel free to use the internet for help. (Just get close.) Create two new histograms (two separate visualizations) by modifying the number of bins used in your previous histogram. (You choose the number of bins for each.)
Which of the two histograms created in the previous step shows the best representation of the data?
Explore the relationship between education level and wage by creating a scatter plot with education on the x-axis and wage on the y-axis.
Recreate your previous scatter plot but "jitter" the points. Use color to differentiate Hispanic from non-Hispanic workers.
What do you observe in the visualization you created in the previous step.
Compare the wage distributions for union and non-union workers by creating a scatter plot with thecategorical variable on the x-axis and using different colors for the union and non-union points.
Add box plots to your previous visualization.
Color the fill of your previous box plots by union status. Ensure the data points are still visible by making your box plots somewhat transparent.
Based on the visualization you created in the previous step, what do you notice about the wage distributions for union and non-union workers?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started