Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this project you will be working with R, SQL, and Python in the same document. We will use the data sets airlines and flights

In this project you will be working with R, SQL, and Python in the same document. We will use the data sets airlines and flights from the package nycflights13.

Knit the empty file (to make sure everything is working)

Consistently knit the file each time you answer a question

In R, install the packages nycflights13, sqldf, tydiverse and load all data sets and packages. Take your time to understand the data sets.

```{r message=FALSE, warning=FALSE}

# Install code here - comment the code when you finish installation # Load all packages here # Load the data here

```

Question 1: List the name of airlines where the destination is ATL airport with their average arrival delays and sort them from the smallest to largest average arrival delays. Use data frames flights and airlines.

We shall solve this question using R, SQL, and Python.

R solution

You can use the R command ```inner_join()```to join two tables.

```

# Code here

```

SQL solution

Write your SQL query in the function sqldf(). For exmaple, sqldf("select * from relig_income") list the data frame relig_income.

```{r, warning=FALSE, message=FALSE }

# code here

```

Python solution

Use df.join() from pandas.

Use [df.join()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html) from pandas.

```{python} # load python libraries # load data # code here ```

Question 2: Plot the boxplot of the departure delays vs the name of airlines where the destination is ATL airport. Solve this question using R and Python.

R solution

Use ggplot

```{r, warning=FALSE, message=FALSE }

# code here

```

Python solution

You can use pandas.boxplot() or seaborn.boxplot(). You can either import data from the R package into Python or use the data Python package nycflights13

You can use [pandas.boxplot()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html) or [seaborn.boxplot()](https://seaborn.pydata.org/generated/seaborn.boxplot.html). You can either import data from the R package into Python or use the data Python package [nycflights13](https://pypi.org/project/nycflights13/)

```{python}

# import python libraries # data # Boxplot

```

Question 3: For each airlines, 1) find the month where the average departure delay time is the highest in the year. 2) Make a visualization to show the results. Solve this question using your preferred language R or Python.

```{} # Do not foget to specify which language are you using. # load libraries # load data # code here to find the months # code here to make the visualization ``` 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Demystifying Databases A Hands On Guide For Database Management

Authors: Shiva Sukula

1st Edition

8170005345, 978-8170005346

More Books

Students also viewed these Databases questions