Question
In this assignment I will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with
In this assignment I will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.
In this assignment, I will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:
- cleaning data with pandas
- make specific changes with numpy
- handling date-related values with datetime
Note: please consider the flights departing from BOS, JFK, SFO and LAX.
Index(['YEAR', 'MONTH', 'DAY', 'ORIGIN_AIRPORT', 'DESTINATION_AIRPORT', 'AIRLINE', 'FLIGHT_NUMBER', 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY'], dtype='object')
Index(['IATA_CODE', 'AIRLINE'], dtype='object')
Question 3
For this question, find the top three airline names which have high number of flights and the least percentage of delay compared to other airlines. The result should be a dataframe which has three columns AIRLINE_NAME, NUM_FLIGHTS and PERC_DELAY.
Hint:
- percentage of delay for each airline is obtained using groupby and apply methods
- merge flights_df with airlines_df to get the names of top three airlines
TEST ON:
top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())
assert sorted(list(top_three_airlines_df.columns)) == sorted(['NUM_FLIGHTS', 'PERC_DELAY', 'AIRLINE_NAME']), "Dataframe doesn't have required columns"
assert top_three_airlines_df.loc[0, 'AIRLINE_NAME'] == 'United Air Lines Inc.', "Top airline name doesn't match"
Answer from your tutor:
Report this answer
MateIron10283Active 5 hours ago
check explanation.
Explanation:
Based on the provided information and requirements, here's an example implementation to find the top three airlines with the highest number of flights and the least percentage of delays using Python's popular libraries such as pandas, numpy, and datetime.
```python
import pandas as pd
def top_three_airlines(flights_df, airlines_df):
# Filter flights departing from BOS, JFK, SFO, and LAX
airports = ['BOS', 'JFK', 'SFO', 'LAX']
flights_df = flights_df[flights_df['ORIGIN_AIRPORT'].isin(airports)]
# Clean and prepare data
flights_df['SCHEDULED_DEPARTURE'] = pd.to_datetime(flights_df['SCHEDULED_DEPARTURE'], format='%Y-%m-%d %H:%M:%S')
flights_df['YEAR'] = flights_df['SCHEDULED_DEPARTURE'].dt.year
flights_df['MONTH'] = flights_df['SCHEDULED_DEPARTURE'].dt.month
flights_df['DAY'] = flights_df['SCHEDULED_DEPARTURE'].dt.day
# Calculate the number of flights per airline
num_flights = flights_df.groupby('AIRLINE').size().reset_index(name='NUM_FLIGHTS')
# Calculate the percentage of delay for each airline
flights_df['DELAYED'] = flights_df['DEPARTURE_DELAY'] > 0
perc_delay = flights_df.groupby('AIRLINE')['DELAYED'].mean().reset_index(name='PERC_DELAY')
# Merge with airlines_df to get airline names
top_airlines = pd.merge(num_flights, airlines_df, left_on='AIRLINE', right_on='IATA_CODE')
# Sort by the number of flights and percentage of delay
top_airlines.sort_values(by=['NUM_FLIGHTS', 'PERC_DELAY'], ascending=[False, True], inplace=True)
# Select the top three airlines
top_three = top_airlines.head(3)
# Select only the required columns
top_three = top_three[['AIRLINE', 'NUM_FLIGHTS', 'PERC_DELAY']].rename(columns={'AIRLINE': 'AIRLINE_NAME'})
return top_three
# Example usage
top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())
print(top_three_airlines_df)
```
Make sure to replace `flights_df_raw` with your raw flights dataset and `airlines_df` with your airlines dataset. The function `top_three_airlines` will return a dataframe containing the top three airline names, the number of flights, and the percentage of delays for each airline.
Please note that the above code assumes you have already loaded and prepared the flights and airlines datasets as pandas dataframes.
Step by Step Solution
3.45 Rating (164 Votes )
There are 3 Steps involved in it
Step: 1
Question number 3 def topthreeairlinesflightsdf airlinesdf Calculate percentage of delay for each ai...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started