Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Data: for this homework we will be using the flight data from /ds410/flightdata/2010-summary.csv in the HDFS file system. Every line contains an ORIGIN COUNTRY NAME,
Data: for this homework we will be using the flight data from /ds410/flightdata/2010-summary.csv in the HDFS file system. Every line contains an ORIGIN COUNTRY NAME, a DEST COUNTRY NAME and a count which is the number of daily flights from the origin country to the destination country Query: for each pair of countries (A, B), we are interested in how many ways there are of going from A to B in exactly 2 flights. For example, if the flight data looks like this: DEST COUNTRY NAME Canada Mexico Germany Germany France Germany ORIGIN COUNTRY NAME United States United States count Mexico Canada Germany United States 4 then the output would look like: United States France United States Germany10 Mexico Canada 30 France 18 France 24 Because, for example, you can go from United States to Germany using exactly two flights in the following ways: . 2 flights from United States to Mexico and 3 flights from Mexico to Germany, giving 2x3-6 choices . 1 flight from United States to Canada and 4 flights from Canada to Germany, giving 1x4-4 choices So overall, there are 6+4-10 possibilities. Assignment: implement this in mapreduce. It involves a join (between flight data and itself) and possibly 2 steps (2 map/reduce pairs). Hint: think of how you would do it using1 or maybe 2 sql queries (what exactly do you need to join on?) and then do it in mapreduce Upload: you will be producing 2 files: 1. Your mapreduce code. Call it flight py 2. Your output. Use the "cat" command to save the entire output locally (hdfs dfs -cat my output directory >result.txt). This will create a file called "result.txt" that has your answ ers. Upload this to canvas. Data: for this homework we will be using the flight data from /ds410/flightdata/2010-summary.csv in the HDFS file system. Every line contains an ORIGIN COUNTRY NAME, a DEST COUNTRY NAME and a count which is the number of daily flights from the origin country to the destination country Query: for each pair of countries (A, B), we are interested in how many ways there are of going from A to B in exactly 2 flights. For example, if the flight data looks like this: DEST COUNTRY NAME Canada Mexico Germany Germany France Germany ORIGIN COUNTRY NAME United States United States count Mexico Canada Germany United States 4 then the output would look like: United States France United States Germany10 Mexico Canada 30 France 18 France 24 Because, for example, you can go from United States to Germany using exactly two flights in the following ways: . 2 flights from United States to Mexico and 3 flights from Mexico to Germany, giving 2x3-6 choices . 1 flight from United States to Canada and 4 flights from Canada to Germany, giving 1x4-4 choices So overall, there are 6+4-10 possibilities. Assignment: implement this in mapreduce. It involves a join (between flight data and itself) and possibly 2 steps (2 map/reduce pairs). Hint: think of how you would do it using1 or maybe 2 sql queries (what exactly do you need to join on?) and then do it in mapreduce Upload: you will be producing 2 files: 1. Your mapreduce code. Call it flight py 2. Your output. Use the "cat" command to save the entire output locally (hdfs dfs -cat my output directory >result.txt). This will create a file called "result.txt" that has your answ ers. Upload this to canvas
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started