Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Apache Spark API Description Use Apache Spark to find the answer for the following questions. If you prefer to use less data for convenience, you
Apache Spark API Description Use Apache Spark to find the answer for the following questions. If you prefer to use less data for convenience, you can pick any 2 or 3 files from the dataset. Show a sample of 5 records from dataset. Read the data with data types. Make a new column MonthStr, Which has months in form of 01, 02, 03, ..., 12. Find the # of flights each airline made. Find the mean departure delay per origination airport. What is the average departure delay from each airport? Find the on-time (ArrTime - CRSArrTime <= 0) performance for each unique carrier. Define a UDF and make a new column using this UDF. How well does weather predict plane delays? Apply a machine learning model to answer this question. Dataset You will be working on Airline On-Time Performance Dataset. Dataset is availabe at my Google Drive. Use your school ID to access to the folder. The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. It has a total file size of 1.6 GB (compressed, and 12 GB when uncompressed), with the number of records a little over 123 million records. Each row represents an individual flight record with details of that flight in the row. The information are: No Name Data Type Description 1 Year int64 1987-2008 2 Month int64 1-12 3 DayofMonth int64 1-31 4 DayOfWeek int64 1 (Monday) - 7 (Sunday) 5 DepTime float64 actual departure time (local, hhmm) 6 CRSDepTime float64 scheduled departure time (local, hhmm) 7 ArrTime float64 actual arrival time (local, hhmm) 8 CRSArrTime float64 scheduled arrival time (local, hhmm) 9 UniqueCarrier string unique carrier code 10 FlightNum float64 flight number 11 TailNum string plane tail number 12 ActualElapsedTime float64 in minutes 13 CRSElapsedTime float64 in minutes 14 AirTime float64 in minutes 15 ArrDelay float64 arrival delay, in minutes 16 DepDelay float64 departure delay, in minutes 17 Origin string origin IATA airport code 18 Dest string destination IATA airport code 19 Distance float64 in miles 20 TaxiIn float64 taxi in time, in minutes 21 TaxiOut float64 taxi out time in minutes 22 Cancelled int64 was the flight cancelled? 23 CancellationCode string reason for cancellation (A = carrier, B = weather, C = NAS, D = security) 24 Diverted int64 1 = yes, 0 = no 25 CarrierDelay float64 in minutes 26 WeatherDelay float64 in minutes 27 NASDelay float64 in minutes 28 SecurityDelay float64 in minutes 29 LateAircraftDelay float64 in minutes You can find more information about this dataset in the website of Statistical Computing. Find out more information on Airline On-Time Performance Data from Bureau of Transportation Statistics (BTS). AirlineInfo-20220119T014240Z-001.zip
Attachments area
|
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started