Question
Goal: In this assignment, we will compute PageRank score for the web dataset provided by Google in a programming challenge in a programming constest in
Goal: In this assignment, we will compute PageRank score for the web dataset provided by Google in a programming challenge in a programming constest in 2002. Input Format: The datasets are given in txt. The file format is:
- Rows from 1 to 4: Metadata. They give information about the dataset and are self-explained.
- Following rows: each row consists of 2 values represents the link from the web page in the 1st column to the web page in the 2nd column. For example, if the row is 0 11342, this means there is a directed link from the page id 0 to the page id 11324.
There are two dataset that we will work with in this assignment.
- web-Google_10k.txt: This dataset contains 10,000 web pages and 78323 links. The dataset can be downloaded from here. DO NOT assume that page ids are from 0 to 10,000.
- web-Google.txt: This dataset contains 875,713 web pages and 5,105,039 links. The dataset can be downloaded from here. DO NOT assume that page ids are from 0 to 875,713.
Also, it's helpful to test your algorithm with this toy dataset. Output Format: the output format for each quesion will be specified below. There are two questions in this assigment worth 50 points total. Question 2 (30 points): Implement the PageRank algorithm for both datasets. The taxation parameter for both dataset is = 0.85 and the number of PageRank iterations is T = 10.
- (15 points)Run your algorithm for web-Google_10k.txt dataset. For full score, your algorithm must run in less than 30 seconds. The output must be written to a file named PR_10k.tsv
- (15 points)Run your algorithm for web-Google.txt dataset. For full score, your algorithm must run in less than 2 minutes. The output must be written to a file named PR_800k.tsv
The output format for Question 2 is two-column:
- The first column is the PageRank score.
- The second column is the corresponding web page id.
The output must be sorted by descending order of the PageRank scores. Here is a sample output for the toy dataset above.
PageRank Ids0.32454706832136704 00.3002013029682813 50.24391355866172854 40.22515097722621097 30.22515097722621097 20.22515097722621097 1
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started