Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

An inverted index is a mapping of words to their location in a set of documents. Most modern search engines utilize some form of an

An inverted index is a mapping of words to their location in a set of documents. Most modern search engines utilize some form of an inverted index to process user-submitted queries.

below is an inverted index that supports queries. This means that you will need a positional inverted index that maps a word to locations in a set of documents.

This inverted index should have these characteristics:

All words in the index should be lower case.

No punctuation, numbers, or symbols should be represented in the index.

These stopwords should not be included in the index: and, but, is, the, to. You may use any method you want to support this.

If the code for an inverted index is below. Write a query program in Python that queries your inverted index. You should be able to support the following:

Boolean search queries which return documents that satisfies condition specified. You should be able to support AND and OR as well as a combination of the two.

For any word provided by a user, return the files with the word and for each instance a word appears in a file provide the position within the file.

=================================================================================

import string import pprint import json stop_words = ['and', 'but', 'is','to', 'the'] inverted_index = {} files_list =['filea.txt'] #files in directory doc_num = 0 for file in files_list: doc_num = doc_num + 1 words_to_delete = [] f = open(file) line = f.read().split() line_lower = [word.lower().translate(str.maketrans('', '', string.punctuation)) for word in line] # handle punctuation and lower case position = 0 for word in line_lower: position = position + 1 if word not in inverted_index: inverted_index[word] = [(files_list[doc_num - 1], position)] # make inverted index else: inverted_index[word] = inverted_index[word] + [(files_list[doc_num - 1], position)] for word in inverted_index: locs = inverted_index[word] new_locs = {} for item in locs: if item[0] not in new_locs: new_locs[item[0]] = [item[1]] else: new_locs[item[0]] = new_locs[item[0]] + [item[1]] inverted_index[word] = new_locs if word in string.punctuation or word in stop_words: words_to_delete.append(word) for word2 in words_to_delete: del inverted_index[word2] f.close() with open('file.json', 'w') as outfile: json.dump(inverted_index, outfile) print('Inverted Index is : ') # show results pprint.pprint(inverted_index) 

where filea.txt is a text file with any text in it.

For example create a txt file with the following in it.

The weather is amazing.

I Love mangos

@

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Database Relational Model A Retrospective Review And Analysis

Authors: C. J. Date

1st Edition

0201612941, 978-0201612943

More Books

Students also viewed these Databases questions