Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You will need to submit the following: 1.) the code for the final sorting stage (30 points) 2.) the code showing the change you have

You will need to submit the following:

1.) the code for the final sorting stage (30 points)

2.) the code showing the change you have made to handle the spider trap (no need to submit the

classes/functions which you have not made any changes, but do submit the whole class/function

where the change is made) (10 points)

3.) a screen shot showing the command you used to start your job (5 points)

4.) a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points)

image text in transcribed

image text in transcribed

//Sampled Code to Start it

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.DoubleWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class HadoopPageRank extends Configured implements Tool {

public int run(String[] args) throws Exception {

// house keeping

int iteration = 0;

Path inputPath = new Path(args[0]);

Path basePath = new Path(args[1]);

FileSystem fs = FileSystem.get(getConf());

fs.delete(basePath, true);

fs.mkdirs(basePath);

// configure initial job

Job initJob = Job.getInstance(getConf(),"pageRank");

initJob.setJarByClass(HadoopPageRank.class);

initJob.setMapperClass(HadoopPageRankInitMapper.class);

initJob.setReducerClass(HadoopPageRankInitReducer.class);

initJob.setOutputKeyClass(Text.class);

initJob.setOutputValueClass(Text.class);

initJob.setInputFormatClass(TextInputFormat.class);

Path outputPath = new Path(basePath, "iteration_" + iteration);

FileInputFormat.addInputPath(initJob, inputPath);

FileOutputFormat.setOutputPath(initJob, outputPath);

// let initJob run and wait for finish

if ( !initJob.waitForCompletion(true) ) {

return -1;

}

// calculate the page ranks

int totalIterations = Integer.parseInt(args[2]);

while ( iteration

iteration ++;

inputPath = outputPath; // new input is the old output

outputPath = new Path(basePath, "iteration_" + iteration);

Job mainJob = Job.getInstance(getConf(),"Iteration " + iteration);

mainJob.setJarByClass(HadoopPageRank.class);

mainJob.setMapperClass(HadoopPageRankMainJobMapper.class);

mainJob.setReducerClass(HadoopPageRankMainJobReducer.class);

mainJob.setOutputKeyClass(Text.class);

mainJob.setOutputValueClass(Text.class);

mainJob.setInputFormatClass(TextInputFormat.class);

FileInputFormat.setInputPaths(mainJob, inputPath);

FileOutputFormat.setOutputPath(mainJob, outputPath);

if ( !mainJob.waitForCompletion(true) ) {

return -1;

}

}

// collect the result, highest rank first

Job resultJob = Job.getInstance(getConf(),"final result");

resultJob.setJarByClass(HadoopPageRank.class);

resultJob.setMapperClass(HadoopPageRankResultMapper.class);

resultJob.setOutputKeyClass(DoubleWritable.class);

resultJob.setOutputValueClass(Text.class);

resultJob.setInputFormatClass(TextInputFormat.class);

FileInputFormat.setInputPaths(resultJob, outputPath);

FileOutputFormat.setOutputPath(resultJob,new Path(basePath, "result"));

if ( !resultJob.waitForCompletion(true) ) {

return -1;

}

return 0;

}

public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new HadoopPageRank(), args);

System.exit(exitCode);

}

}

// reference:

// https://stackoverflow.com/questions/5029285/implementing-pagerank-using-mapreduce/13568286#13568286

// https://stackoverflow.com/questions/5029285/implementing-pagerank-using-mapreduce

// https://www.youtube.com/watch?v=9e3geIYFOF4

// https://github.com/danielepantaleone/hadoop-pagerank

// https://gdfm.me/2011/06/10/iterative-algorithms-in-hadoop/

// https://stackoverflow.com/questions/8925051/how-to-iterate-when-calculating-pagerank-with-mapreduce

Question 1 (50 points) In BDP-4-pageRank, we have discussed the pageRank implementation using Hadoop (you can download the code from our course website, but do note that the code you download does not have the final stage, it is only meant to give you a starting point for this homework). utr ontforthis hmeof inelude the final The implementation we have discussed in the class does not include the final sorting stage. In addition, it does not take dead end and spider trap into account. Your task is to change the code so 1). to add a final sorting stage, 2). it can handle spider trap by using the following teleporting schema note and other parameters are explained in BDP-4-pageRank, and you can use 0.85 as its value You will need to submit the following: 1. the code for the final sorting stage (30 points) 2. the code showing the change you have made to handle the spider trap (no need to submit the classes/functions which you have not made any changes, but do submit the whole class/function where the change is made) (10 points) 3. a screen shot showing the command you used to start your job (5 points) 4. a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points) Question 1 (50 points) In BDP-4-pageRank, we have discussed the pageRank implementation using Hadoop (you can download the code from our course website, but do note that the code you download does not have the final stage, it is only meant to give you a starting point for this homework). utr ontforthis hmeof inelude the final The implementation we have discussed in the class does not include the final sorting stage. In addition, it does not take dead end and spider trap into account. Your task is to change the code so 1). to add a final sorting stage, 2). it can handle spider trap by using the following teleporting schema note and other parameters are explained in BDP-4-pageRank, and you can use 0.85 as its value You will need to submit the following: 1. the code for the final sorting stage (30 points) 2. the code showing the change you have made to handle the spider trap (no need to submit the classes/functions which you have not made any changes, but do submit the whole class/function where the change is made) (10 points) 3. a screen shot showing the command you used to start your job (5 points) 4. a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mastering Influxdb Database A Comprehensive Guide To Learn Influxdb Database

Authors: Cybellium Ltd ,Kris Hermans

1st Edition

B0CNGGWL7B, 979-8867766450

More Books

Students also viewed these Databases questions

Question

Choosing Your Topic Researching the Topic

Answered: 1 week ago

Question

The Power of Public Speaking Clarifying the

Answered: 1 week ago