Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this assignment I will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with

In this assignment I will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.

In this assignment, I will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:

  • cleaning data with pandas
  • make specific changes with numpy
  • handling date-related values with datetime

Note: please consider the flights departing from BOS, JFK, SFO and LAX.

 

 

Index(['YEAR', 'MONTH', 'DAY', 'ORIGIN_AIRPORT', 'DESTINATION_AIRPORT', 'AIRLINE', 'FLIGHT_NUMBER', 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY'], dtype='object')

Index(['IATA_CODE', 'AIRLINE'], dtype='object')


Question 3

For this question, find the top three airline names which have high number of flights and the least percentage of delay compared to other airlines. The result should be a dataframe which has three columns AIRLINE_NAME, NUM_FLIGHTS and PERC_DELAY.

Hint:

  • percentage of delay for each airline is obtained using groupby and apply methods
  • merge flights_df with airlines_df to get the names of top three airlines

TEST ON:

top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())

assert sorted(list(top_three_airlines_df.columns)) == sorted(['NUM_FLIGHTS', 'PERC_DELAY', 'AIRLINE_NAME']), "Dataframe doesn't have required columns"
assert top_three_airlines_df.loc[0, 'AIRLINE_NAME'] == 'United Air Lines Inc.', "Top airline name doesn't match"
 

Answer from your tutor:

Report this answer

 

MateIron10283Active 5 hours ago

 

check explanation.


Explanation:

Based on the provided information and requirements, here's an example implementation to find the top three airlines with the highest number of flights and the least percentage of delays using Python's popular libraries such as pandas, numpy, and datetime.

```python
import pandas as pd

def top_three_airlines(flights_df, airlines_df):
   # Filter flights departing from BOS, JFK, SFO, and LAX
   airports = ['BOS', 'JFK', 'SFO', 'LAX']
   flights_df = flights_df[flights_df['ORIGIN_AIRPORT'].isin(airports)]
   
   # Clean and prepare data
   flights_df['SCHEDULED_DEPARTURE'] = pd.to_datetime(flights_df['SCHEDULED_DEPARTURE'], format='%Y-%m-%d %H:%M:%S')
   flights_df['YEAR'] = flights_df['SCHEDULED_DEPARTURE'].dt.year
   flights_df['MONTH'] = flights_df['SCHEDULED_DEPARTURE'].dt.month
   flights_df['DAY'] = flights_df['SCHEDULED_DEPARTURE'].dt.day
   
   # Calculate the number of flights per airline
   num_flights = flights_df.groupby('AIRLINE').size().reset_index(name='NUM_FLIGHTS')
   
   # Calculate the percentage of delay for each airline
   flights_df['DELAYED'] = flights_df['DEPARTURE_DELAY'] > 0
   perc_delay = flights_df.groupby('AIRLINE')['DELAYED'].mean().reset_index(name='PERC_DELAY')
   
   # Merge with airlines_df to get airline names
   top_airlines = pd.merge(num_flights, airlines_df, left_on='AIRLINE', right_on='IATA_CODE')
   
   # Sort by the number of flights and percentage of delay
   top_airlines.sort_values(by=['NUM_FLIGHTS', 'PERC_DELAY'], ascending=[False, True], inplace=True)
   
   # Select the top three airlines
   top_three = top_airlines.head(3)
   
   # Select only the required columns
   top_three = top_three[['AIRLINE', 'NUM_FLIGHTS', 'PERC_DELAY']].rename(columns={'AIRLINE': 'AIRLINE_NAME'})
   
   return top_three

# Example usage
top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())

print(top_three_airlines_df)
```

Make sure to replace `flights_df_raw` with your raw flights dataset and `airlines_df` with your airlines dataset. The function `top_three_airlines` will return a dataframe containing the top three airline names, the number of flights, and the percentage of delays for each airline.

Please note that the above code assumes you have already loaded and prepared the flights and airlines datasets as pandas dataframes.

Step by Step Solution

3.45 Rating (164 Votes )

There are 3 Steps involved in it

Step: 1

Question number 3 def topthreeairlinesflightsdf airlinesdf Calculate percentage of delay for each ai... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Accounting Information Systems

Authors: George H. Bodnar, William S. Hopwood

11th Edition

0132871939, 978-0132871938

More Books

Students also viewed these Programming questions

Question

What recourse does one have if defrauded in an Internet scam?

Answered: 1 week ago