Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The Internet Archive is a digital archive that makes snapshots (a local copy) of a huge number of Web pages, archiving the Internet for future

  1. The Internet Archive is a digital archive that makes snapshots (a local copy) of a huge number of Web pages, archiving the Internet for future generations. It currently has collected more than 400 billion snapshots of Web pages; very dynamic pages such as the front page of the New York Times website or the Volkskrant website are crawled several times per day. Other Web pages are crawled only once a year. Assume you are tasked by the Internet Archive to provide the following services for their data: . (25p)

    *All snapshots crawled within the past 30 days should be directly accessible to users: it should take merely seconds between a user submitting a URL and the Internet Archive returning the list of snapshots taken during the past 30 days. **Requesting older snapshots requires more patience: a user can submit a list of URLs and a time frame of interest html between March 1, 2009 and March 15, 2009) and within a few hours or at most days the Internet Archive should return the requested snapshots.

    ***On the website of the Internet Archive live statistics should be shown, indicating the number of URL requests issued by users today and the past month, the number of snapshots crawled today and the number of older snapshot requests currently being processed. Given your knowledge of big data (and small-data) technologies, discuss which technologies you would use to provide these three services.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intranet And Web Databases For Dummies

Authors: Paul Litwin

1st Edition

0764502212, 9780764502217

More Books

Students also viewed these Databases questions