Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 03, 2024

Mini Search Engine: Given a file of documents where each document occupies a line of the file, you are to build a data structure (called

Mini Search Engine: Given a file of documents where each document occupies a line of the file, you are to build a data structure (called an inverse index) that allows you to identify those documents containing a given word. We will identify the documents by document number: the document represented by the first line of the file is document number 0, that represented by the second line is document number 1, and so on. Make matching of case-insensitive (e.g. Wall and wall are the same words)

Note that the period is considered part of a substring. To make this easier, we have a file of documents in which punctuation are separated from words by spaces. Often one wants to iterate through the elements of a list while keeping track of the indices of the elements. Python provides enumerate(L) for this purpose.

For example, start with a tiny test file --- doc0.txt which contains only three lines of data:

A bb Ccc , .

D ee A bb

FFF

The inverse_index should be a dictionary with the key storing each word, and the value storing the line containing all occurrence of that word:

{{'a': [0, 1], 'bb': [0, 1], 'ccc': [0], ',': [0], '.': [0], 'd': [1], 'ee': [1],

'fff': [2]}

So, a appears in both line 0 and 1. `fff appears in only line 2. Your assignment is to first write python code to generate such inverse_index dict, and then use it to perform queries in the following.

Write a procedure make_inverse_index(strlist) that, given a list of strings (documents), returns a dictionary that maps each word to the set consisting of the document numbers of documents in which that word appears. This dictionary is called an inverse index. (Hint: use enumerate.)
Write a procedure or_search(inverseIndex, query) which takes an inverse index and a list of words query, and returns the set of document numbers specifying all documents that contain any of the words in query.
Write a procedure and_search(inverseIndex, query) which takes an inverse index and a list of words query, and returns the set of document numbers specifying all documents that contain all of the words in query.
Try out your procedures on stories.txt. Try or_search, and_search with the queries united states and wall street on the documents in stories.txt and save the resulting lists (tw lists of document ids per query) in a file named results.txt.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Sams Teach Yourself Beginning Databases In 24 Hours

Sams Teach Yourself Beginning Databases In 24 Hours

Authors: Ryan Stephens, Ron Plew

1st Edition

067232492X, 978-0672324925

More Books

Students also viewed these Databases questions

Question

★★★★★

Annual dividends of ATTA Corp grew from $0.96 in 2005 to $1.76 in 2017. What was the annual growth rate?

Answered: 1 week ago

Question

★★★★★

Chapter 2 - Accounting for Materials Work in Process Bal. Inventory 3,600 | e. To finished goods 47,500 b. Material Requisitions 19,000 c. Direct Labor 17,000 d. Factory Overhead 12,000 Finished...

Answered: 1 week ago

Question

★★★★★

Explain what scientific methods are and why they are important.

Answered: 1 week ago

Question

★★★★★

In 2013, the Barton and Barton Company changed its method of valuing inventory from the FIFO method to the average cost method. At December 31, 2012, B & Bs inventories were $32 million (FIFO). B &...

Answered: 1 week ago

Question

★★★★★

Question 3 (7 marks) (This question is from the Week 8 Tutorial) BDX company issues $ 15 million of five-year, 6.5 per cent, semi-annual coupon debentures to public which pay interest each six...

Answered: 1 week ago

Question

★★★★★

Phoenix Company's 2019 master budget included the following fixed budget report. It is based on an expected production and sales volume of 15.000 units. 5 975,000 225,000 PHOENIX COMPANY Fixed budget...

Answered: 1 week ago

Question

★★★★★

a sin(2x) x Let f(x)=2x+1 In(be)

Answered: 1 week ago

Question

★★★★★

Jerry Cameron, the rooms division manager for the Foxfire Hotel, believes he has pinpointed one of the main causes for the hotel's declining revenues: the reservations department's failure to...

Answered: 1 week ago

Question

★★★★★

4.21. (A Simple bandit model) Suppose there are two projects available for selection in each of three periods. Project one yields a reward of 1 unit and always occupies state s and the other, project...

Answered: 1 week ago

Question

★★★★★

negotiating racial inequality What are some takeaways from your perspectives and the perspectives of others? If you are comfortable sharing, what are your personal positions on these matters....

Answered: 1 week ago

Question

★★★★★

The COVID-19 pandemic demonstrated how effectively integrating collaboration and teaming technologies can affect an organization's effectiveness. a) Discuss any three technological developments that...

Answered: 1 week ago

Question

★★★★★

I have the equipment and related resources I need to do my job well.

Answered: 1 week ago

Question

★★★★★

In anticipation of upgrading its web presence, the American Woodcrafters Association asked those who visited its public site during September, October, and November (207 hits; 112 responses) to...

Answered: 1 week ago

Question

★★★★★

My opinions/suggestions are valued.

Answered: 1 week ago

Previous Question Next Question