Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1-It is assumed that the user provides you with a list of words, one per line, in a text file called words.txt. Your application must

1-It is assumed that the user provides you with a list of words, one per line, in a text file called "words.txt". Your application must find the answers to the following questions when it is run on the command line like this: "java MonLucene mots.txt" where the file "mots.txt" is in the current directory.

- The first concept to find is the one grouping the most documents and having at least one word in the list provided by your user. If there is more than one, find them all. (Give only the words defining the concept. Do not include the list of documents.)

- The second concept to find is the one using two words in the list, and the largest number of documents. Again, if there is more than one, you have to find them all. These concepts result in pairs of words that are related because they frequently appear together in the same document. (Give only the sets of words defining a concept. Do not include the list of documents.)

- Finally, the last concept to find is the one using three words appearing in the largest number of documents. If there is more than one, find them all. These concepts result in triplets of words that are related because they frequently appear together in the same document. (Give only the sets of words defining a concept. Do not include the list of documents.)

Index. For the first problem, do a search with each of the words. Then find the most popular word (or most popular words), i.e. the word appearing in the most documents. Then repeat with the word pairs and, finally, with the word triplets.

You must submit not only the code, but the solution corresponding to the following list of words:

man

woman

house

home

school

dog

cat

land

country

white

children

example

paper

music

letter

river

book

town

room

friend

Tips. If the word "man" is present in 500 documents, then the words "man" and "woman" must be present in at most 500 documents. Make sure your stats make sense!

- What is the complexity of the implementation (made in the previous question) according to the number of words specified: is it image text in transcribed or image text in transcribed? What would happen if the user entered 1000 words instead of 20 words?

- We asked you to use truncation. What would happen if you repeated the experiment without performing word truncation. Explain your point of view.

O(n),O(n2) O(n3)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2015 Porto Portugal September 7 11 2015 Proceedings Part 1 Lnai 9284

Authors: Annalisa Appice ,Pedro Pereira Rodrigues ,Vitor Santos Costa ,Carlos Soares ,Joao Gama ,Alipio Jorge

1st Edition

3319235273, 978-3319235271

More Books

Students also viewed these Databases questions