Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

```{r, include = FALSE} library(MASS) library(plyr) library(dplyr) library(ggplot2) library(knitr) ``` ### Problem 1: A few simple plots For this problem we'll use the `diamonds` dataset

```{r, include = FALSE} library(MASS) library(plyr) library(dplyr) library(ggplot2) library(knitr) ``` ### Problem 1: A few simple plots For this problem we'll use the `diamonds` dataset from the `ggplot2` package. ##### (a) Base R graphics Use the `hist` function to create a histogram of `carat` with bars colored `steelblue`. ```{r} # Edit me ``` ##### (b) qplot histogram Use the `qplot` function from the `ggplot2` package to create a histogram of `depth`. Note that `geom = "histogram"` is a valid geometry in `qplot`. ```{r} # Edit me ``` ##### (c) qplot violin plots Use the `qplot` function from the `ggplot2` library to create violin plots showing how `price` varies across diamond `cut`. Specify `fill = cut` to get all the boxplots to be coloured differently. ```{r} # Edit me ``` **Hint**: For this exercise, it will be useful to know that `violin` is a geometry (`geom`) built into `ggplot2`, and that `qplot` can be called with the arguments: ```{r, eval = FALSE} qplot(x, y, data, geom, fill) ``` ### Problem 2: ggplot practice For this exercise we'll go back to the Cars93 data set in the MASS library ##### (a) size mapping, geom_point() Define a `ggplot` object using the Cars93 data set that you can use to view `Price` on the y-axis, `MPG.highway` on the x-axis, and set the `size` mapping to be based on `Horsepower`. Use `geom_point()` to create a scatterplot from your `ggplot` object. ```{r} # Edit me ``` ##### (b) colour mapping Repeat part (a), this time also setting the `colour` mapping to be based on `Origin`. ```{r} # Edit me ``` ##### (c) changing color palette Repeat part (b), this time using the `scale_colour_manual()` layer to specify that you want to use `cbPalette` as your color palette. ```{r} cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") # Edit me ``` ### Problem 3: More ggplot2 practice #####(a) stat_smooth() Repeat part 2(b), this time using `stat_smooth()` to add a layer showing the smoothed curve representing how `Price` varies with `MPG.highway`. ```{r} # Edit me ``` #####(b) facet_grid() Use your ggplot object from 2(b) along with the `geom_point()` and `facet_grid` layers to create scatterplots of `Price` against `MPG.highway`, broken down by (conditioned on) `Origin`. ```{r} # Edit me ``` (Your code should produce a figure with two scatterplots, analogous to the `facet_wrap` example from class. Note that the example from class had a factor with 7 levels, so 7 scatterplots were produced. `Origin` has two levels.) #####(c) Overlaying regression lines. Modify your solution to part (b) to also display regression lines for each scatterplot. ```{r} # Edit me ``` ### Problem 4: ddply() practice This problem uses the Adult dataset, which we load below. The main variable of interest here is `high.income`, which indicates whether the individual's income was over $50K. Anyone for whom `high.income == 1` is considered a "high earner". ```{r} adult.data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header=FALSE, fill=FALSE, strip.white=T, col.names=c("age", "type_employer", "fnlwgt", "education", "education_num","marital", "occupation", "relationship", "race","sex", "capital_gain", "capital_loss", "hr_per_week","country", "income")) adult.data <- mutate(adult.data, high.income = as.numeric(income == ">50K")) ``` ##### (a) Income by education level Use the `ddply()` function to produce a summary table showing how many individuals there are in each `education_num` bin, and how the proportion of high earners varies across `education_num` levels. Your table should have column names: `education_num`, `count` and `high.earn.rate`. ```{r} # Edit me ``` ##### (b) Constructing a bar chart Using the `ggplot` and `geom_bar` commands along with your data summary from part **(a)** to create a bar chart showing the high earning rate on the y axis and `education_num` on the x axis. Specify that the color of the bars should be determined by the number of individuals in each bin. ```{r} # Edit me ``` ##### (c) summary table with multiple splitting variables Use the `ddply()` function to produce a summary table showing how the proportion of high earners varies across all combinations of the following variables: `sex`, `race`, and `marital` (marital status). In addition to showing the proportion of high earners, your table should also show the number of individuals in each bin. Your table should have column names: `sex`, `race`, `marital`, `count` and `high.earn.rate`. ```{r} # Edit me ``` ##### (d) Nicer table output using `kable()` Use the `kable()` function from the `knitr` library to display the table from part **(c)** in nice formatting. You should use the `digits` argument to ensure that the values in your table are being rounded to a reasonable number of decimal places. ```{r} # Edit me ``` ### Problem 5: Getting the right plot ##### (a) A more complex bar chart. Using the table you created in 4(c), use ggplot graphics to construct a plot that looks like [the one at this link](http://www.andrew.cmu.edu/user/achoulde/94842/homework/target_fig.png) **Hint** You may find it useful to use the following layers: `facet_grid`, `coord_flip` (for horizontal bar charts), `theme` (rotating x axis text) and `guides` (removing fill legend). ```{r, fig.height = 4, fig.width = 8} # Edit me ``` ##### (b) Hiding code with `echo` Repeat part **(a)**, but this time set the `echo` argument of the code chunk in such a way that the code is not printed, but the plot is still displayed. ```{r, fig.height = 4, fig.width = 8} # Edit me ``` 

Please help solve the edit me parts. Jupiter notebook r based

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Database Systems

Authors: Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V.S. Subrahmanian, Roberto Zicari

1st Edition

155860443X, 978-1558604438

More Books

Students also viewed these Databases questions

Question

What are Measures in OLAP Cubes?

Answered: 1 week ago

Question

How do OLAP Databases provide for Drilling Down into data?

Answered: 1 week ago

Question

How are OLAP Cubes different from Production Relational Databases?

Answered: 1 week ago