Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Goal: In this assignment, we will compute PageRank score for the web dataset provided by Google in a programming challenge in a programming constest in

Goal: In this assignment, we will compute PageRank score for the web dataset provided by Google in a programming challenge in a programming constest in 2002. Input Format: The datasets are given in txt. The file format is:

  • Rows from 1 to 4: Metadata. They give information about the dataset and are self-explained.
  • Following rows: each row consists of 2 values represents the link from the web page in the 1st column to the web page in the 2nd column. For example, if the row is 0 11342, this means there is a directed link from the page id 0 to the page id 11324.

There are two dataset that we will work with in this assignment.

  1. web-Google_10k.txt: This dataset contains 10,000 web pages and 78323 links. The dataset can be downloaded from here. DO NOT assume that page ids are from 0 to 10,000.
  2. web-Google.txt: This dataset contains 875,713 web pages and 5,105,039 links. The dataset can be downloaded from here. DO NOT assume that page ids are from 0 to 875,713.

Also, it's helpful to test your algorithm with this toy dataset. Output Format: the output format for each quesion will be specified below. There are two questions in this assigment worth 50 points total. Question 2 (30 points): Implement the PageRank algorithm for both datasets. The taxation parameter for both dataset is = 0.85 and the number of PageRank iterations is T = 10.

  1. (15 points)Run your algorithm for web-Google_10k.txt dataset. For full score, your algorithm must run in less than 30 seconds. The output must be written to a file named PR_10k.tsv
  2. (15 points)Run your algorithm for web-Google.txt dataset. For full score, your algorithm must run in less than 2 minutes. The output must be written to a file named PR_800k.tsv

The output format for Question 2 is two-column:

  • The first column is the PageRank score.
  • The second column is the corresponding web page id.

The output must be sorted by descending order of the PageRank scores. Here is a sample output for the toy dataset above.

PageRank Ids0.32454706832136704 00.3002013029682813 50.24391355866172854 40.22515097722621097 30.22515097722621097 20.22515097722621097 1

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Understanding Oracle APEX 5 Application Development

Authors: Edward Sciore

2nd Edition

1484209893, 9781484209899