Question
How many neighbors you have? In this project, you are asked to work on the MapReduce framework. From the lecture, that you need to refer
How many neighbors you have?
In this project, you are asked to work on the MapReduce framework. From the lecture, that you need to refer to, MapReduce is one of important techniques to solve Big Data problems. Mainly, it has two main phases; namely Map phase and Reduce phase. In each one of these, you have sub-phases. Briefly, on a cluster of nodes/cores, during the Map phase, the cluster nodes running the map program should emit key-value pairs based on the split chunks of the input file. These key-value pairs will be consumed by the cluster nodes running the reduce program. The reduce component usually summarizes the data received by the map phase to produce the final output after the combining the output coming from several nodes.
Objectives:
1- Understanding the conceptual bases of the Map-Reduce frame work
2- Solving a problem by applying the Map-Reduce framework
3- Having the hands-on experience of developing and running Map-Reduce programs
4- Gaining the skills for not only developing the code, but also:
a. creating excitable jar files, b. moving data back and forth between the local system and the Hadoop file system, c. configuring a Map-Reduce job, d. submitting the job for execution, and e. Obtaining the results.
You will be given a network file. The file has the network you will work with. When you download the file from the link given below, you need to open it. You can extract the compressed file using WinRAR for instance. To open the network file, notepad or notepad++ will not be a good choice because the file is huge for a simple text editor. However, you can open it using a free editor such as PilotLite. The file needs a very simple cleaning. You need to remove the first 4 lines in order to keep the network only.
Task One: (20 points)
In this project you will be solving the problem of counting the neighboring nodes of a node in a network. If two nodes have a link/an edge, they are considered neighbors. You can think of the network as Friendship network (nodes are friends and links represent friendship relations), Co-Authorship network (nodes are authors and links represent common work) ...etc.
Bonus Task: (15 points)
In this part, you are only asked to report the top 30 nodes with the largest number of neighbors in the network. If several nodes have the same number of neighbors, then you can break these ties randomly.
If you decided to submit the bonus part, you have to submit separate files that are called Bonus.java AND Bonus.jar
For the bonus part, the code has to produce accurate/correct results. No partial credit will be given. For more information please see below:
1. General Information: Data Description
The network has
Nodes: 3072441
Edges: 117185083
The uncompressed file size: 1.64 GB
The data in the file has the following format:
d.
1 11
1 12
1 13
2 5
2 6
2 15
2 60
2 62
The numbers represent node ids
Each line shows a link/edge in the network
In a line, every two nodes are delimited by the tab character (\t)
For instance, in the snapshot given in point d., the neighbors of node 1 are: 11, 12, and
13. The neighbors count is 3 (this is the number you need to report for each node)
For example, the output of the previous snapshot network given in d. would look like:
1 3
2 5
5 1
6 1
11 1
12 1
13 1
15 1
60 1
62 1
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started