Question
You will need to submit the following: 1.) the code for the final sorting stage (30 points) 2.) the code showing the change you have
You will need to submit the following:
1.) the code for the final sorting stage (30 points)
2.) the code showing the change you have made to handle the spider trap (no need to submit the
classes/functions which you have not made any changes, but do submit the whole class/function
where the change is made) (10 points)
3.) a screen shot showing the command you used to start your job (5 points)
4.) a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points)
//Sampled Code to Start it
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HadoopPageRank extends Configured implements Tool {
public int run(String[] args) throws Exception {
// house keeping
int iteration = 0;
Path inputPath = new Path(args[0]);
Path basePath = new Path(args[1]);
FileSystem fs = FileSystem.get(getConf());
fs.delete(basePath, true);
fs.mkdirs(basePath);
// configure initial job
Job initJob = Job.getInstance(getConf(),"pageRank");
initJob.setJarByClass(HadoopPageRank.class);
initJob.setMapperClass(HadoopPageRankInitMapper.class);
initJob.setReducerClass(HadoopPageRankInitReducer.class);
initJob.setOutputKeyClass(Text.class);
initJob.setOutputValueClass(Text.class);
initJob.setInputFormatClass(TextInputFormat.class);
Path outputPath = new Path(basePath, "iteration_" + iteration);
FileInputFormat.addInputPath(initJob, inputPath);
FileOutputFormat.setOutputPath(initJob, outputPath);
// let initJob run and wait for finish
if ( !initJob.waitForCompletion(true) ) {
return -1;
}
// calculate the page ranks
int totalIterations = Integer.parseInt(args[2]);
while ( iteration
iteration ++;
inputPath = outputPath; // new input is the old output
outputPath = new Path(basePath, "iteration_" + iteration);
Job mainJob = Job.getInstance(getConf(),"Iteration " + iteration);
mainJob.setJarByClass(HadoopPageRank.class);
mainJob.setMapperClass(HadoopPageRankMainJobMapper.class);
mainJob.setReducerClass(HadoopPageRankMainJobReducer.class);
mainJob.setOutputKeyClass(Text.class);
mainJob.setOutputValueClass(Text.class);
mainJob.setInputFormatClass(TextInputFormat.class);
FileInputFormat.setInputPaths(mainJob, inputPath);
FileOutputFormat.setOutputPath(mainJob, outputPath);
if ( !mainJob.waitForCompletion(true) ) {
return -1;
}
}
// collect the result, highest rank first
Job resultJob = Job.getInstance(getConf(),"final result");
resultJob.setJarByClass(HadoopPageRank.class);
resultJob.setMapperClass(HadoopPageRankResultMapper.class);
resultJob.setOutputKeyClass(DoubleWritable.class);
resultJob.setOutputValueClass(Text.class);
resultJob.setInputFormatClass(TextInputFormat.class);
FileInputFormat.setInputPaths(resultJob, outputPath);
FileOutputFormat.setOutputPath(resultJob,new Path(basePath, "result"));
if ( !resultJob.waitForCompletion(true) ) {
return -1;
}
return 0;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new HadoopPageRank(), args);
System.exit(exitCode);
}
}
// reference:
// https://stackoverflow.com/questions/5029285/implementing-pagerank-using-mapreduce/13568286#13568286
// https://stackoverflow.com/questions/5029285/implementing-pagerank-using-mapreduce
// https://www.youtube.com/watch?v=9e3geIYFOF4
// https://github.com/danielepantaleone/hadoop-pagerank
// https://gdfm.me/2011/06/10/iterative-algorithms-in-hadoop/
// https://stackoverflow.com/questions/8925051/how-to-iterate-when-calculating-pagerank-with-mapreduce
Question 1 (50 points) In BDP-4-pageRank, we have discussed the pageRank implementation using Hadoop (you can download the code from our course website, but do note that the code you download does not have the final stage, it is only meant to give you a starting point for this homework). utr ontforthis hmeof inelude the final The implementation we have discussed in the class does not include the final sorting stage. In addition, it does not take dead end and spider trap into account. Your task is to change the code so 1). to add a final sorting stage, 2). it can handle spider trap by using the following teleporting schema note and other parameters are explained in BDP-4-pageRank, and you can use 0.85 as its value You will need to submit the following: 1. the code for the final sorting stage (30 points) 2. the code showing the change you have made to handle the spider trap (no need to submit the classes/functions which you have not made any changes, but do submit the whole class/function where the change is made) (10 points) 3. a screen shot showing the command you used to start your job (5 points) 4. a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points) Question 1 (50 points) In BDP-4-pageRank, we have discussed the pageRank implementation using Hadoop (you can download the code from our course website, but do note that the code you download does not have the final stage, it is only meant to give you a starting point for this homework). utr ontforthis hmeof inelude the final The implementation we have discussed in the class does not include the final sorting stage. In addition, it does not take dead end and spider trap into account. Your task is to change the code so 1). to add a final sorting stage, 2). it can handle spider trap by using the following teleporting schema note and other parameters are explained in BDP-4-pageRank, and you can use 0.85 as its value You will need to submit the following: 1. the code for the final sorting stage (30 points) 2. the code showing the change you have made to handle the spider trap (no need to submit the classes/functions which you have not made any changes, but do submit the whole class/function where the change is made) (10 points) 3. a screen shot showing the command you used to start your job (5 points) 4. a screen shot showing the result of your submitted job, showing the correct output of the pageRank calculation (5 points)Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started