Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

4 Sort Words ( 2 0 % ) Implement a program in Java that receives as arguments an input directory and an output directory and

4 Sort Words (20%)
Implement a program in Java that receives as arguments an input directory and an output directory and that sorts the words read from each file by frequency in descending order and writes the sorted words and their frequencies in a corresponding file in the output directory.
The output files must follow the same folder structure as the input files. For example, if the program sorts the words found in the input file stored at CountedDataset1/folder6/document265.txt, it must store the sorted words in the file at SortedDataset1/folder6/document265.txt, where CountedDataset1 was the input directory and SortedDataset1 was the output directory. This program will use the output of the previous program as input.
When the program finished counting the words from an input file it needs to write in the corresponding output file on each line the word and the number of occurrences, separated by a space, similarly to the previous program.
For example, for the following input file:
filed 1
in 2
a2 different 1
way 1
The 3
year 1
of 4
release 1
date 1
is 4
longer 1
part 1
the 6
directory 1
path 3
based 1
number 1
which 1
identical 1
to 3
filename 3
The program needs to create the corresponding output file that contains:
the 6
of 4
is 4
The 3
path 3
to 3
filename 3
filed 1
in 2
a2 different 1 way 1
year 1 release 1 date 1 longer 1 part 1 directory 1 based 1 number 1 which 1 identical 1
Evaluate your program on the 5 datasets and measure (inside the program) the number of words read from the input and the amount of (wall) time it took to sort the words in all files. Plot a diagram showing how the total number of words from the datasets influences the throughput of your program, measured in words/second (total number of words in the datasets divided by total amount of time to sort the dataset). Answer the following questions:
What data structure(s) did you use to implement the program and why?
What algorithm did you use to sort the data and why?
Is your program compute-intensive, memory-intensive or IO-intensive and why?
Why would the total number of words in a dataset influence the performance of your program on the virtual machine?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

MongoDB Applied Design Patterns Practical Use Cases With The Leading NoSQL Database

Authors: Rick Copeland

1st Edition

1449340040, 978-1449340049

More Books

Students also viewed these Databases questions