Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data: for this homework we will be using the flight data from /ds410/flightdata/2010-summary.csv in the HDFS file system. Every line contains an ORIGIN COUNTRY NAME,

image text in transcribed

Data: for this homework we will be using the flight data from /ds410/flightdata/2010-summary.csv in the HDFS file system. Every line contains an ORIGIN COUNTRY NAME, a DEST COUNTRY NAME and a count which is the number of daily flights from the origin country to the destination country Query: for each pair of countries (A, B), we are interested in how many ways there are of going from A to B in exactly 2 flights. For example, if the flight data looks like this: DEST COUNTRY NAME Canada Mexico Germany Germany France Germany ORIGIN COUNTRY NAME United States United States count Mexico Canada Germany United States 4 then the output would look like: United States France United States Germany10 Mexico Canada 30 France 18 France 24 Because, for example, you can go from United States to Germany using exactly two flights in the following ways: . 2 flights from United States to Mexico and 3 flights from Mexico to Germany, giving 2x3-6 choices . 1 flight from United States to Canada and 4 flights from Canada to Germany, giving 1x4-4 choices So overall, there are 6+4-10 possibilities. Assignment: implement this in mapreduce. It involves a join (between flight data and itself) and possibly 2 steps (2 map/reduce pairs). Hint: think of how you would do it using1 or maybe 2 sql queries (what exactly do you need to join on?) and then do it in mapreduce Upload: you will be producing 2 files: 1. Your mapreduce code. Call it flight py 2. Your output. Use the "cat" command to save the entire output locally (hdfs dfs -cat my output directory >result.txt). This will create a file called "result.txt" that has your answ ers. Upload this to canvas. Data: for this homework we will be using the flight data from /ds410/flightdata/2010-summary.csv in the HDFS file system. Every line contains an ORIGIN COUNTRY NAME, a DEST COUNTRY NAME and a count which is the number of daily flights from the origin country to the destination country Query: for each pair of countries (A, B), we are interested in how many ways there are of going from A to B in exactly 2 flights. For example, if the flight data looks like this: DEST COUNTRY NAME Canada Mexico Germany Germany France Germany ORIGIN COUNTRY NAME United States United States count Mexico Canada Germany United States 4 then the output would look like: United States France United States Germany10 Mexico Canada 30 France 18 France 24 Because, for example, you can go from United States to Germany using exactly two flights in the following ways: . 2 flights from United States to Mexico and 3 flights from Mexico to Germany, giving 2x3-6 choices . 1 flight from United States to Canada and 4 flights from Canada to Germany, giving 1x4-4 choices So overall, there are 6+4-10 possibilities. Assignment: implement this in mapreduce. It involves a join (between flight data and itself) and possibly 2 steps (2 map/reduce pairs). Hint: think of how you would do it using1 or maybe 2 sql queries (what exactly do you need to join on?) and then do it in mapreduce Upload: you will be producing 2 files: 1. Your mapreduce code. Call it flight py 2. Your output. Use the "cat" command to save the entire output locally (hdfs dfs -cat my output directory >result.txt). This will create a file called "result.txt" that has your answ ers. Upload this to canvas

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Database Technology Edbt 88 International Conference On Extending Database Technology Venice Italy March 14 18 1988 Proceedings Lncs 303

Authors: Joachim W. Schmidt ,Stefano Ceri ,Michele Missikoff

1988th Edition

3540190740, 978-3540190745

More Books

Students also viewed these Databases questions

Question

What is liquidation ?

Answered: 1 week ago

Question

Explain the different types of Mergers.

Answered: 1 week ago

Question

What is dividend payout ratio ?

Answered: 1 week ago

Question

9. Describe the characteristics of power.

Answered: 1 week ago

Question

10. Describe the relationship between communication and power.

Answered: 1 week ago