Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc. In this assignment, I will be showing

This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.

In this assignment, I will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:

cleaning data with pandas make specific changes with numpy handling date-related values with datetime Note: please consider the flights departing from BOS, JFK, SFO and LAX.

Each question is equally weighted for the total grade.

import os
import pandas as pd
import pandas.api.types as ptypes
import numpy as np
import datetime as dt

airlines_df= pd.read_csv('assets\airlines.csv')
airports_df = pd.read_csv('assets\airports.csv')
flights_df_raw = pd.read_csv('assets\flights.csv', low_memory = False)

Question 1: Data Preprocessing
For this question, perform the following:

remove rows with missing values
keep flights departing from airports (ORIGIN_AIRPORT) that we want to look at (BOS, JFK, SFO and LAX)
filter out the flights that have more than 1 day delay (DEPARTURE_DELAY)
convert FLIGHT_NUMBER column type to string
SCHEDULED_DEPARTURE is coded as a float where the first two digits indicate the hour and the last two indicate the minutes. Convert this column to datetime format by combining existing columns DAY, MONTH, YEAR and SCHEDULED_DEPARTURE
add IS_DELAYEDcolumn by considering any flight above 15 minutes delay (DEPARTURE_DELAY) are delayed, and any other flight is not delayed
remove YEAR, MONTH, DAY columns

def data_preprocess(flights_df):
# YOUR CODE HERE
#raise NotImplementedError()
return df

flights_df = data_preprocess(flights_df_raw.copy())
assert len(flights_df) == 535744, "Q1: There should be 535744 observations in the flights dataframe"

Question 2

NOTE: The column to merge both dataframes are flights_df['ORIGIN_AIRPORT'] and airports_df['IATA_CODE'] and there is no ['NUM_FLIGHTS'] column in the dataframe

Merge flights_df dataframe with airports_df dataframe and return the number of departing flights (NUM_FLIGHTS) per airport (IATA_CODE) across the year.

def flights_per_airport(flights_df, airports_df):
# YOUR CODE HERE
raise NotImplementedError()
return df

num_flights_df=flights_per_airport(flights_df_raw.copy(), airports_df.copy())

assert num_flights_df.shape==(4,1), "Shape of DataFrame should be (4,1)"
assert num_flights_df.columns[0]=='NUM_FLIGHTS', "DataFrame should have a column which is called NUM_FLIGHTS"
assert num_flights_df.loc["BOS", "NUM_FLIGHTS"] == 105276, "The NUM_FLIGHTS for BOS is wrong"

PLEASE MAKE SURE that the shape of the data frame return as (4,1)

Question 3

For this question, find the top three airline names that have high number of flights and the lowest percentage of delay compared to other airlines. The result should be a data frame which has three columns AIRLINE_NAME, NUM_FLIGHTS, and PERC_DELAY

 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Answer For Question 1 you can use the following code to preprocess the flights data import os import ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Operations Management Processes And Supply Chains

Authors: Lee Krajewski, Naresh Malhotra, Larry Ritzman

13th Global Edition

129240986X, 978-1292409863

More Books

Students also viewed these Operating System questions