Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Using Python 3.7 in Pycharm This assignment requires you to develop a topical/focused crawler to crawl 500 pages, for a topic of your choice, from

Using Python 3.7 in Pycharm

This assignment requires you to develop a topical/focused crawler to crawl 500 pages, for a topic of your choice, from Wikipedia. You need to specify: 1) the topic, 2) at least 10 related terms (could be single words or phrases), and 3) at least 2 seed URLs. In the crawling process, you need to determine whether a page is relevant to the topic: checking whether it contains at least 2 different related terms that you specified, before saving it into the crawled collection. The page-relevance checking process should be case-insensitive. For example, if the topic is Information Retrieval, related terms for the topic information retrieval might be: Information Retrieval, Crawler, Search Engine, tf-idf, Mean Average Precision, Precision, Recall, Relevance Feedback, Query Expansion, Retrieval Models, Boolean Model, Vector Space Model, and Language Model. You can use any programming language that you are comfortable with and you are free to reference codes from online for customization. Prepare a file folder which contains 2 sub-folders: 1. the first sub-folder has all the crawled pages. 2. the second sub-folder has the source code and a report. The report must have the followings: 2a. The topic of your choice, at least 10 related terms, and at least 2 seed URLs. 2b. How the crawler is implemented, number of pages crawled, and the URLs of all crawled pages

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions