Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
PS NA NA SAN SFO NA NA NA NA NA NA NA NA
PS NA NA SAN SFO NA NA NA NA NA NA NA NA
PS NA NA SAN SFO NA NA NA NA NA NA NA NA
PS NA NA SAN SFO NA NA NA NA NA NA NA NA
PS NA NA SAN SFO NA NA NA NA NA NA NA NA
PS NA NA SAN SFO NA NA NA NA NA NA NA NA
this is file content data give me map reduce job code cosider this data In this project, you will develop an Oozie workflow to process and analyze a large volume of flight data.
Instructions:
Students will be automatically placed in groups of for this project.
Install HadoopOozie on your AWS VMs
Download the Airline Ontime Performance data set flight data set from the period of October to April on the following website: Data Expo : Airline ontime dataLinks to an external site.
Design, implement, and run an Oozie workflow to find out the following:
o airlines with the highest and lowest probability, respectively, for being on schedule;
o airports with the longest and shortest average taxi time per flight both in and out respectively, and
o most common reason for flight cancellations.
Requirements:
Your workflow must contain at least three MapReduce jobs that run in fully distributed mode.
Run your workflow to analyze the entire data set total years from to at one time on two VMs first and then gradually increase the system scale to the maximum allowed number of VMs for at least increment steps and measure each corresponding workflow execution time.
Run your workflow to analyze the data in a progressive manner with an increment of year, ie the first year the first years the first years and the total years on the maximum allowed number of VMs and measure each corresponding workflow execution time.
Milestone Submission:
A project report in PDF that includes:
A diagram that shows the structure of your Oozie workflow
A detailed description of the algorithm you designed to solve each of the problems
give me diagram structure of ozzie workflow and detail description of slove each map reduce jobs
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started