Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this exercise, you will create a simplified Lucene index. To get partial credit in case of miscalculations, please give detailed solutions. Given the following

In this exercise, you will create a simplified Lucene index. To get partial credit in case of miscalculations, please give detailed solutions.

Given the following documents:

D1: You say "goodbye", I say "hello, hello, hello"

D2: You say stop, I say go.

D3: "Hello, hello, hello," you say "goodbye".

D4: I say yes, you say no

1. (4 points) Build the inverted index for the documents.

a. Dictionary file:

e.g.

Term DocFreq

hello 2

I 3

b. Posting file (terms are implicit) e.g.

Doc # Frequency

1 3

3 3

c. Position file (terms are implicit from dictionary file, use absolute position of terms in the document) e.g.

D1 D2 D3 D4

6,7,8 0 1,2,3 0

4 4 0 1

d. For a given query

Q: say goodbye,

describe the process to search the inverted index.

2. (2 points)

a. Estimate the total size of the inverted index files in bytes. Numbers and characters are counted as 4 bytes. Strings are counted as the number of characters multiplied by 4 bytes. For example, the size of string hello is 5*4 = 20 bytes.

b. Compare the result from 2a. to the total size of the documents in bytes.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle Database 10g Insider Solutions

Authors: Arun R. Kumar, John Kanagaraj, Richard Stroupe

1st Edition

0672327910, 978-0672327919

More Books

Students also viewed these Databases questions