Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Implement a program in Java that receives as arguments an input directory and an output directory and that counts the number of unique words found
Implement a program in Java that receives as arguments an input directory and an output directory and that counts the number of unique words found in each file and writes them to the output directory.
The output files must follow the same folder structure as the input files. For example, if the program counts the words found in the input file stored at CleanedDatasetfolderdocumenttxt it must store the counted words in the file at CountedDatasetfolderdocumenttxt where CleanedDataset was the input directory and CountedDataset was the output directory.
In this program words are sequences of alphanumerical characters azAZ separated by a delimiter t
r
r This program will use the output of the previous program as input.
When the program finished counting the words from an input file it needs to write in the corresponding output file on each line the word and the number of occurrences, separated by a space.
For example, for the following input file:
EBooks posted since November with etext numbers OVER are
filed in a different way The year of a release date is no longer part
of the directory path The path is based on the etext number which is
identical to the filename The path to the file is made up of single
digits corresponding to all but the last digit in the filename For
example an eBook of filename would be found at
The program needs to create the corresponding output file that contains this example includes only a subset of the output file:
filed
in
a different way
The
year
of release date
is
longer part
the directory
CSC Distributed Systems I Winter
Jarvis College of Computing and Digital Media
DePaul University
path
based
number
which
identical
to
filename
Evaluate your program on the datasets and measure inside the program the amount of data read from the input and the amount of wall time it took to count the words of all files. Make sure to clean the OS file system cache before you run an evaluation. Plot a diagram showing how the size of the datasets, measured in MiB, influences the throughput of your program, measured in MiBsecond datasets size divided by total amount of time to count the words of the dataset
Answer the following questions:
What data structures did you use to implement the program and why?
What is the difference between computeintensive, memoryintensive and IOintensive applications?
Is your program computeintensive, memoryintensive or IOintensive and why?
Why would the dataset size influence the performance of your program on the virtual machine?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started