Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Apache Spark API Description Use Apache Spark to find the answer for the following questions. If you prefer to use less data for convenience, you

Apache Spark API Description Use Apache Spark to find the answer for the following questions. If you prefer to use less data for convenience, you can pick any 2 or 3 files from the dataset. Show a sample of 5 records from dataset. Read the data with data types. Make a new column MonthStr, Which has months in form of 01, 02, 03, ..., 12. Find the # of flights each airline made. Find the mean departure delay per origination airport. What is the average departure delay from each airport? Find the on-time (ArrTime - CRSArrTime <= 0) performance for each unique carrier. Define a UDF and make a new column using this UDF. How well does weather predict plane delays? Apply a machine learning model to answer this question. Dataset You will be working on Airline On-Time Performance Dataset. Dataset is availabe at my Google Drive. Use your school ID to access to the folder. The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. It has a total file size of 1.6 GB (compressed, and 12 GB when uncompressed), with the number of records a little over 123 million records. Each row represents an individual flight record with details of that flight in the row. The information are: No Name Data Type Description 1 Year int64 1987-2008 2 Month int64 1-12 3 DayofMonth int64 1-31 4 DayOfWeek int64 1 (Monday) - 7 (Sunday) 5 DepTime float64 actual departure time (local, hhmm) 6 CRSDepTime float64 scheduled departure time (local, hhmm) 7 ArrTime float64 actual arrival time (local, hhmm) 8 CRSArrTime float64 scheduled arrival time (local, hhmm) 9 UniqueCarrier string unique carrier code 10 FlightNum float64 flight number 11 TailNum string plane tail number 12 ActualElapsedTime float64 in minutes 13 CRSElapsedTime float64 in minutes 14 AirTime float64 in minutes 15 ArrDelay float64 arrival delay, in minutes 16 DepDelay float64 departure delay, in minutes 17 Origin string origin IATA airport code 18 Dest string destination IATA airport code 19 Distance float64 in miles 20 TaxiIn float64 taxi in time, in minutes 21 TaxiOut float64 taxi out time in minutes 22 Cancelled int64 was the flight cancelled? 23 CancellationCode string reason for cancellation (A = carrier, B = weather, C = NAS, D = security) 24 Diverted int64 1 = yes, 0 = no 25 CarrierDelay float64 in minutes 26 WeatherDelay float64 in minutes 27 NASDelay float64 in minutes 28 SecurityDelay float64 in minutes 29 LateAircraftDelay float64 in minutes You can find more information about this dataset in the website of Statistical Computing. Find out more information on Airline On-Time Performance Data from Bureau of Transportation Statistics (BTS).

AirlineInfo-20220119T014240Z-001.zip

Attachments area

ReplyForward

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Databases Work The Pragmatic Wisdom Of Michael Stonebraker

Authors: Michael L. Brodie

1st Edition

1947487167, 978-1947487161

More Books

Students also viewed these Databases questions

Question

What are the various ways by which prices are determined?

Answered: 1 week ago