Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

How many neighbors you have? In this project, you are asked to work on the MapReduce framework. From the lecture, that you need to refer

How many neighbors you have?

In this project, you are asked to work on the MapReduce framework. From the lecture, that you need to refer to, MapReduce is one of important techniques to solve Big Data problems. Mainly, it has two main phases; namely Map phase and Reduce phase. In each one of these, you have sub-phases. Briefly, on a cluster of nodes/cores, during the Map phase, the cluster nodes running the map program should emit key-value pairs based on the split chunks of the input file. These key-value pairs will be consumed by the cluster nodes running the reduce program. The reduce component usually summarizes the data received by the map phase to produce the final output after the combining the output coming from several nodes.

Objectives:

1- Understanding the conceptual bases of the Map-Reduce frame work

2- Solving a problem by applying the Map-Reduce framework

3- Having the hands-on experience of developing and running Map-Reduce programs

4- Gaining the skills for not only developing the code, but also:

a. creating excitable jar files, b. moving data back and forth between the local system and the Hadoop file system, c. configuring a Map-Reduce job, d. submitting the job for execution, and e. Obtaining the results.

You will be given a network file. The file has the network you will work with. When you download the file from the link given below, you need to open it. You can extract the compressed file using WinRAR for instance. To open the network file, notepad or notepad++ will not be a good choice because the file is huge for a simple text editor. However, you can open it using a free editor such as PilotLite. The file needs a very simple cleaning. You need to remove the first 4 lines in order to keep the network only.

Task One: (20 points)

In this project you will be solving the problem of counting the neighboring nodes of a node in a network. If two nodes have a link/an edge, they are considered neighbors. You can think of the network as Friendship network (nodes are friends and links represent friendship relations), Co-Authorship network (nodes are authors and links represent common work) ...etc.

Bonus Task: (15 points)

In this part, you are only asked to report the top 30 nodes with the largest number of neighbors in the network. If several nodes have the same number of neighbors, then you can break these ties randomly.

If you decided to submit the bonus part, you have to submit separate files that are called Bonus.java AND Bonus.jar

For the bonus part, the code has to produce accurate/correct results. No partial credit will be given. For more information please see below:

1. General Information: Data Description

The network has

Nodes: 3072441

Edges: 117185083

The uncompressed file size: 1.64 GB

The data in the file has the following format:

d.

1 11

1 12

1 13

2 5

2 6

2 15

2 60

2 62

The numbers represent node ids

Each line shows a link/edge in the network

In a line, every two nodes are delimited by the tab character (\t)

For instance, in the snapshot given in point d., the neighbors of node 1 are: 11, 12, and

13. The neighbors count is 3 (this is the number you need to report for each node)

For example, the output of the previous snapshot network given in d. would look like:

1 3

2 5

5 1

6 1

11 1

12 1

13 1

15 1

60 1

62 1

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke, David Auer, Scott Vandenberg, Robert Yoder

8th Edition

013460153X, 978-0134601533

Students also viewed these Databases questions

Question

4. Who would lead the group?

Answered: 1 week ago

Question

Where those not participating, encouraged to participate?

Answered: 1 week ago