Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

in R, code the following questions.... ## The Houston Flights Data The dataset that we will be using for this assignment is on flights from

in R, code the following questions....

## The Houston Flights Data

The dataset that we will be using for this assignment is on flights from Houston, Texas in 2011. The dataset contains all flights departing from Houston airports IAH (George Bush Intercontinental) and HOU (Houston Hobby) in the year 2011. The data comes from the [Research and Innovation Technology Administration at the Bureau of Transporation statistics](http://www.transtats.bts.gov/DatabaseInfo.asp?DB_ID=120&Link=0). The dataset is contained in the `hflights` package and contains 227,496 flights and 21 variables. The variables are:

Variable | Description ---------|------------ year | The year of the flight month | the month of the flight day_of_month | the day of the month of the flight day_of_week | day of week of departure (useful for removing weekend effects) dep_time | time of departure in local time (hhmm) arr_time | time of arrival in local time (hhmm) unique_carrier | unique abbreviation for a carrier (ex. wn = Southwest Airlines weirdly) flight_num | flight number tail_num | airplane tail number actual_elapsed_time | elapsed time of flight, in minutes air_time | flight time, in minutes arr_delay | arrival delay, in minutes dep_delay | departure delay, in minutes origin | origin airport code dest | destination airport code distance | distance of flight, in miles taxi_in | taxi in times, in minutes taxi_out | taxi out times, in minutes cancelled | cancelled indicator: 1 = Yes, 0 = No cancel_code | reason for cancellation: A = carrier, B = weather, C = national air system, D = security diverted | diverted indicator: 1 = Yes, 0 = No

The first thing that you need to do is to install the `hflights` data and load the package into your session.

```{r install and load, message = FALSE, warning=FALSE} installing the hlfights data. Run the install.packages() lines in your consle, NOT inside the RMarkdown document.

install.packages("hflights") install.packages("tidyverse") install.packages("magrittr")

library("hflights") library("tidyverse");theme_set(theme_bw()) library("magrittr") ```

The `hflights` package holds one dataset `hflights`. In the code below, I modify the original dataset names to make them better. Do not modify this code!!! We don't know how to do this yet, but we will soon and this type of data "cleaning" is very valuable! ```{r fix_name} hflights %<>% as_tibble %>% purrr::set_names(~ str_replace_all(.x, "([a-z])([A-Z])", "\\1_\\2")) %>% purrr::set_names(~ str_to_lower(.x)) %>% rename("day_of_month" = dayof_month) %>% rename("cancel_code" = cancellation_code) ```

The code above does modify the `hflights` but does not change the name of the dataset. This is done with the pipe-assign operator `%<>%`, which will act as a pipe, but at the end will assign the result back to the object at the top of the "pipeline". All that to say, use `hflights` for all subsequent questions.

## Problems

__Problem 4__: Wow that's a lot of flights for some carriers! But maybe they just fly out of the airports more often! Check by finding the proportion of departure delayed flights over the year. Comment on the differences/similarity in the summaries from problem 3 and problem 4.

```{r}

```

__Comment__:

__Problem 5__: Now make a similar boxplot as in problem 2 for each of the carriers and still filter for greater than 30 minutes. You should follow the same general hints for this question. Comment on your findings:

```{r}

```

__Comments__:

__Problem 6__: How much time do the airports gain or lose between arrival? To answer this question, find the mean amount of time between the arrival delay and the departure delay. Hint: use mutate to create a new variable that holds the difference column. Comment on if the airports are gaining or losing time.

```{r}

```

__Comments__:

__Problem 7__: Maybe that number above is misleading! Airlines will try to stay on schedule and thus will make the departure on time rather than early. I want you to find the same statistic as above, but this time only for flights that were delayed on arrival. Comment on the degree to which the average changed / stayed the same.

```{r}

```

__Comments__:

__Problem 8__: Now that I know what day to travel and what airline to travel, I need to find out which airport to fly out of!! There are two airports in this data. IAH and HOU. They are in the `origin` variable. Along the same lines as the other problems, find the average departure delay time for each airport. Also, this time make histograms for departure delays greater than 30 minutes and use `facet_wrap()` to plot them beside each other. Also, try adjusting the number of bins to see if the relationship becomes any clearer. Lastly, I want you to copy and paste the exact same code that you used to make the histograms in the last chunk and add a y aesthetic of ``..density..`. Comment on the differences between the plots and what you think the `..density..` did to the plots.

```{r}

```

```{r}

```

```{r}

```

__Comments:__

__Problem 9__: This is where you get to be creative (Hopefully that's exciting and not scary)! I want you to come up with a question from the data that I did not already ask and solve it! It can be any question that you want as long as you have to use some `dplyr` function to solve it! Feel free to consult me if you can't come up with a question!

```{r}

```

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Principles Of Database Systems With Internet And Java Applications

Authors: Greg Riccardi

1st Edition

020161247X, 978-0201612479

More Books

Students also viewed these Databases questions

Question

1. How does Kiwi Experience maintain a continual customer focus?

Answered: 1 week ago