Question
Given the set of documents on the Gutenberg CD provided to you: Answer the following questions: Question 1 1.1 First index only 10 documents, then
Given the set of documents on the Gutenberg CD provided to you:
Answer the following questions:
Question 1
1.1 First index only 10 documents, then 20, then 30, then all the documents with the txt extension on the CD-ROM (in the etexxx directories); then calculate the time required for the execution of each step, then calculate the size (on disk) of each index as well as the respective sum of the file sizes. Also do, in a loop, for each index, 100 times 10 keyword searches, then divide the sum obtained by 1000 to obtain the search time. Make a five-column table with the following headings: number of files, total file size, indexing time, index size, and average search time. 1.2 From the table of the previous exercise, can you tell if the performance of Lucene decreases with the size of the index? Suppose your index is 1000 times larger, approximately what would be the average search time? 1.3 From your table, give the size of the index and the time needed to build the index if you have to index 8 billion documents averaging 20 KB. Is it possible to consider this? What would you do if you were given the mandate to do it?
mohwk10.txt mpolo10.txt oroos10.txt poe1v10.txt poe2v10.txt poe3v11.txt poe4v10.txt poe5v10.txt rbddh10.txt remus10.txt rlchn10.txt sffrg 10.txt shkdd10.txt sign410.txt truss10.txt utopi10.txt utrkj10.txtStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started