Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

In JAVA Develop expertise in constructing positional indexes and processing phrase and proximity queries. 1. Problem Statement: Given a text corpus, develop a positional index.

In JAVA

Develop expertise in constructing positional indexes and processing phrase and

proximity queries.

1. Problem Statement: Given a text corpus, develop a positional index. Process phrase and proximity

queries using the positional index.

2.Text Corpus

You will be provided a large document corpus (about 50,000 documents). The corpus

will be provided in ASCII file format. Each document will come in a separate physical

file. You are responsible for normalizing the text.

3. Solution Steps

Following are high-level solution steps. You may need to make several decisions

in each step related to low-level implementation details. Think about alternatives,

articulate their pros and cons, reason about their algorithmic correctness and efficiency, and make informed decisions. It is strongly encouraged that hold one or two

brainstorming sessions with your team members to strategize a solution before you delve into code-level details.

Normalize text address punctuation characters, stemming/lemmatization,

and lowercasing. Do not throw away stop words.

Extract tokens and identify vocabulary for the dictionary.

Scan the corpus and build a positional index.

Implement the algorithm for processing phrase/proximity queries (page 39 of

textbook, Figure 2.12). However, if you choose to, you may use a different

algorithm.

Proximity queries are limited to only one parameter type: maximum distances

between words. For example, consider the query: united /0 states /3 enraged

/2 actions.

Proximity queries of this assignment are not required to handle parameters

such as: within the same sentence, and within the same paragraph.

Develop a simple interface for users to specify phrase/proximity queries. The

interface can be as simple as prompting the user for a phrase/proximity query

(i.e., a text string). You may also read a phrase/proximity query through command line arguments.

Design test cases and execute them. Document execution results.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions