Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this project we will build an efficient boolean search engine. Part (a): Inverted Index You are asked to implement a simple inverted index that

image text in transcribedimage text in transcribed

In this project we will build an efficient boolean search engine. Part (a): Inverted Index You are asked to implement a simple inverted index that will enable us to make our search engine very efficient. Please implement in python the function: create_index (corpus) The function will accept a corpus (a list of documents/strings) as a parameter and return an inverted index that will contain all terms and for each term it will contain the documents where that term occurs. (hint: you can implement the inverted index as a python dictionary) Part (b): Boolean search (OR/AND) You are asked to implement a boolean search function that would allow us to search an indexed corpus using OR/AND operators. In our search engine, space " " will represent OR ("apple ipad" means apple OR ipad). AND will be represented by the "\&" symbol with no spaces ("apple\&ipad" means apple AND ipad). The AND operator will have priority over the OR operator: samsung sony\&tv will mean samsung OR (sony AND tv) samsung\&sony tv will mean (samsung AND sony) 0R tv You can assume that the AND operator "\&" will always appear with no spaces before/after it. (You don't need to worry about such cases: "apple \& ipad"). Please implement in python the function: boolean_search(query, index) The function will accept a query (a string) and index (the inverted index from part a) as parameters and return a list containing the results of searching the index for the query. Create an inverted index of the news dataset (you need to parse the json file first) using the function you implemented in part (a). The news dataset is available on LMS. Try searching for five different queries and report the number of results returned by each query and the time it took the search process. You can measure the time (in microseconds) it takes a function to execute in python as follows: import time start = time.time( ) ## function here end = time.time ( ) print ( (end - start)*10**6) Deliverables There are two deliverables for this assignment: 1) You should submit a .py file containing all the code you implemented 2) You should submit a pdf deck/presentation showing: A. The results of the part (c) experiments B. Some of the design choices you had to make and why

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions