Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Implement the simplified Search Engine in Python described below for the pages of a small Web site(5-10 pages). Use all the words in the pages

Implement the simplified Search Engine in Python described below for the pages of a small Web site(5-10 pages). Use all the words in the pages of the site as index terms, excluding stop words such as articles, prepositions, and pronouns. then implement ranking.

The core information stored by a search engine is a dictionary, called an inverted index or inverted file, storing key-value pairs (w,L), where w is a word and L is a collection of references to pages containing word w. The keys (words) in this dictionary are called index terms and should be a set of vocabulary entries and proper nouns as large as possible. The elements in this dictionary are called occurrence lists and should cover as many web pages as possible. We can efficiently implement an inverted index with a data structure consisting of the following: An array storing the occurrence lists of the terms (in no particular order) A compressed trie for the set of index terms, where each external node stores the index of the occurrence list of the associated term. The reason for storing the occurrence lists outside the trie is to keep the size of the trie data structure sufficiently small to fit in internal memory. Instead, because of their large total size, the occurrence lists have to be stored on disk. With our data structure, a query for a single keyword is similar to a word matching query for standard tries. Namely, we find the keyword in the trie and we return the associated occurrence list. When multiple keywords are given and the desired output is the pages containing all the given keywords, we retrieve the occurrence list of each keyword using the trie and return their intersection. To facilitate the intersection computation, each occurrence list should be implemented with a sequence sorted by address or with a dictionary, which allows for a simple intersection algorithm similar to sorted sequence merging. In addition to the basic task of returning a list of pages containing given keywords, search engines provide an important additional service by ranking the pages returned by relevance.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Select Healthcare Classification Systems And Databases

Authors: Katherine S. Rowell, Ann Cutrell

1st Edition

0615909760, 978-0615909769

More Books

Students also viewed these Databases questions