Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

e message, and then the following menu will appear that will enable the user to benefit from the system. When the program starts, it will

e message, and then the following menu
will appear that will enable the user to benefit from the system.
When the program starts, it will begin with a high-precision clock that measures index creation time and displays the
time spent creating an index data structure. Index creation duration will be printed within the title bar of the menu.
Simple Document Retrieval System
(30 Minutes)
1. Enter a single keyword to list the document(s)(file names)
2. Print the top 10 words that appeared most frequently
3. Print the top 10 words that appeared least frequently
4. Exit
Option:
Design Guidelines
While reading text files(or documents):
Lowercase all words,
Get all words, where a word is a string of alpha characters terminated by a non-alpha character (white space
is not alpha). The alpha characters are defined to be [a-z]. Therefore, the sequence of characters for the words
apple+78&^+orange would be apple and orange.
Limitations and Assumptions
1. The collection of documents is closed (content and number of documents are fixed and will never change).
2. Each document is stored in a single text file. Hence, if there are 10,000 documents, there are 10,000 text files.
(Collection of documents is provided on the web online)
Figure 1 and Figure 2 summarize the aim of this project.
File Name
(or Document ) Content of File
1.txt Pease porridge hot, pease porridge cold.
2.txt Pease porridge in the pot.
3.txt Nine days old.
4.txt Some like it hot, some like it cold.
5.txt Some like it in the pot.
6.txt Nine days old.
Figure 1: Small set of files and their content.
Figure 2: Text Files ( Documents) are Indexed by their word contents.
Since we may not easily estimate the number of words among all documents, one possible solution would be to use
linked lists to maintain a list of words. Also, for each word, a list of files needs to be kept; the number of documents(or
files) may not be estimated again, then the use of the linked list is suggested.
Head
File Name
(or Document ) Content of File
1.txt Pease porridge hot, pease porridge cold.
2.txt Pease porridge in the pot.
3.txt Nine days old.
4.txt Some like it hot, some like it cold.
5.txt Some like it in the pot.
6.txt Nine days old.
Term File
Name
cold 1,4
days 3,6
hot 1,4
in 2,5
it 4,5
like 4,5
nine 3,6
old 3,6
pease 1,2
porridge 1,2
pot 2,5
some 4,5
the 2,5
cold days hot like porridge
1
4
3
6
1
4
4
5
1

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

Students also viewed these Databases questions

Question

How did the sale to Coca-Cola impact Honest Teas investors?

Answered: 1 week ago

Question

How is the NDAA used to shape defense policies indirectly?

Answered: 1 week ago

Question

What is the message frequency?

Answered: 1 week ago

Question

What is the schedule for this project?

Answered: 1 week ago

Question

Who is responsible for this project?

Answered: 1 week ago