Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

You will need to submit the following: 1.) the code for the final sorting stage (30 points) 2.) the code showing the change you have

You will need to submit the following:

1.) the code for the final sorting stage (30 points)

2.) the code showing the change you have made to handle the spider trap (no need to submit the

classes/functions which you have not made any changes, but do submit the whole class/function

where the change is made) (10 points)

3.) a screen shot showing the command you used to start your job (5 points)

4.) a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points)

image text in transcribed

//Sampled Code to Start it

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.DoubleWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class HadoopPageRank extends Configured implements Tool {

public int run(String[] args) throws Exception {

// house keeping

int iteration = 0;

Path inputPath = new Path(args[0]);

Path basePath = new Path(args[1]);

FileSystem fs = FileSystem.get(getConf());

fs.delete(basePath, true);

fs.mkdirs(basePath);

// configure initial job

Job initJob = Job.getInstance(getConf(),"pageRank");

initJob.setJarByClass(HadoopPageRank.class);

initJob.setMapperClass(HadoopPageRankInitMapper.class);

initJob.setReducerClass(HadoopPageRankInitReducer.class);

initJob.setOutputKeyClass(Text.class);

initJob.setOutputValueClass(Text.class);

initJob.setInputFormatClass(TextInputFormat.class);

Path outputPath = new Path(basePath, "iteration_" + iteration);

FileInputFormat.addInputPath(initJob, inputPath);

FileOutputFormat.setOutputPath(initJob, outputPath);

// let initJob run and wait for finish

if ( !initJob.waitForCompletion(true) ) {

return -1;

}

// calculate the page ranks

int totalIterations = Integer.parseInt(args[2]);

while ( iteration

iteration ++;

inputPath = outputPath; // new input is the old output

outputPath = new Path(basePath, "iteration_" + iteration);

Job mainJob = Job.getInstance(getConf(),"Iteration " + iteration);

mainJob.setJarByClass(HadoopPageRank.class);

mainJob.setMapperClass(HadoopPageRankMainJobMapper.class);

mainJob.setReducerClass(HadoopPageRankMainJobReducer.class);

mainJob.setOutputKeyClass(Text.class);

mainJob.setOutputValueClass(Text.class);

mainJob.setInputFormatClass(TextInputFormat.class);

FileInputFormat.setInputPaths(mainJob, inputPath);

FileOutputFormat.setOutputPath(mainJob, outputPath);

if ( !mainJob.waitForCompletion(true) ) {

return -1;

}

// collect the result, highest rank first

Job resultJob = Job.getInstance(getConf(),"final result");

resultJob.setJarByClass(HadoopPageRank.class);

resultJob.setMapperClass(HadoopPageRankResultMapper.class);

resultJob.setOutputKeyClass(DoubleWritable.class);

resultJob.setOutputValueClass(Text.class);

resultJob.setInputFormatClass(TextInputFormat.class);

FileInputFormat.setInputPaths(resultJob, outputPath);

FileOutputFormat.setOutputPath(resultJob,new Path(basePath, "result"));

if ( !resultJob.waitForCompletion(true) ) {

return -1;

}

return 0;

}

public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new HadoopPageRank(), args);

System.exit(exitCode);

}

// reference:

// https://stackoverflow.com/questions/5029285/implementing-pagerank-using-mapreduce/13568286#13568286

// https://stackoverflow.com/questions/5029285/implementing-pagerank-using-mapreduce

// https://www.youtube.com/watch?v=9e3geIYFOF4

// https://github.com/danielepantaleone/hadoop-pagerank

// https://gdfm.me/2011/06/10/iterative-algorithms-in-hadoop/

// https://stackoverflow.com/questions/8925051/how-to-iterate-when-calculating-pagerank-with-mapreduce

Question 1 (50 points) In BDP-4-pageRank, we have discussed the pageRank implementation using Hadoop (you can download the code from our course website, but do note that the code you download does not have the final stage, it is only meant to give you a starting point for this homework). utr ontforthis hmeof inelude the final The implementation we have discussed in the class does not include the final sorting stage. In addition, it does not take dead end and spider trap into account. Your task is to change the code so 1). to add a final sorting stage, 2). it can handle spider trap by using the following teleporting schema note and other parameters are explained in BDP-4-pageRank, and you can use 0.85 as its value You will need to submit the following: 1. the code for the final sorting stage (30 points) 2. the code showing the change you have made to handle the spider trap (no need to submit the classes/functions which you have not made any changes, but do submit the whole class/function where the change is made) (10 points) 3. a screen shot showing the command you used to start your job (5 points) 4. a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points) Question 1 (50 points) In BDP-4-pageRank, we have discussed the pageRank implementation using Hadoop (you can download the code from our course website, but do note that the code you download does not have the final stage, it is only meant to give you a starting point for this homework). utr ontforthis hmeof inelude the final The implementation we have discussed in the class does not include the final sorting stage. In addition, it does not take dead end and spider trap into account. Your task is to change the code so 1). to add a final sorting stage, 2). it can handle spider trap by using the following teleporting schema note and other parameters are explained in BDP-4-pageRank, and you can use 0.85 as its value You will need to submit the following: 1. the code for the final sorting stage (30 points) 2. the code showing the change you have made to handle the spider trap (no need to submit the classes/functions which you have not made any changes, but do submit the whole class/function where the change is made) (10 points) 3. a screen shot showing the command you used to start your job (5 points) 4. a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points)