Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

NYC Flights We will use some flight data in this problem. To access it , remember to load the tidyverse, mdsr , and nycflights 1

NYC Flights
We will use some flight data in this problem. To access it, remember to load the tidyverse,
mdsr, and nycflights13 packages. If you've never used the mdsr package, remember to run
install.packages("mdsr") in your console (just once, and this shouldn't be in your .Rmd, but
rather in your console) and THEN run library(mdsr)(put this in the setup chunk). Do the same for the
nycflights13 package.
Load the flights dataset. Calculate, for each month, the average distance flown. Arrange from longest
to shortest. Call this new dataset avg_d. IInt: to load in data from a package, use the function data()
and set the argument to the name of the dataset.
Load the airlines dataset (from the nycflights13 package). Join this dataset with the flights
dataset so that no rows are dropped. Keep the flight, carrier, and name columns. Call this new
dataset flightsJoined. Use the head() function to preview the first 5 rows of this dataset
Now, create a new dataset flightsJoined2 that:
creates a new variable, distance_km, which is distance in kilometers (note that 1 mile is about 1.6
kilometers)
keeps only the variables: name, flight, arr_delay, and distance_km
keeps only observations where distance is less than 500 kilometers
orders the observtaions from highest to lowest arr_delay
Display the first 10 observations of this dataset. Comment on what you see.
Hint: start again by joining the flights and airlines datasets as in (2).
Lastly, compute the number of flights (call this N), the average arrival delay (call this avg_arr_delay),
and the average distance in kilometers (call this avg_dist_km) among these flights with distances less
than 500km(i.e. working off of flightsJoined2) grouping by the carrier (aka airline) name. Sort the
results in descending order based on avg_arr_delay. Show your results.
Getting NAs for avg_arr_delay? That happens when some observations are missing that data. Before
grouping and summarizing, add a line to exclude observations with missing arrival delay information using
filter(
is.na(arr_delay)== FALSE).
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Programming Languages 12th International Symposium Dbpl 2009 Lyon France August 2009 Proceedings Lncs 5708

Authors: Philippa Gardner ,Floris Geerts

2009th Edition

3642037925, 978-3642037924

More Books

Students also viewed these Databases questions