Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Implement a program in in Java that receives as arguments an input directory and an output directImplement a program in C + + or in

Implement a program in in Java that receives as arguments an input directory and an output directImplement a program in C++ or in Java that receives as arguments an input directory and an output directory and that cleans the files from the input directory and writes the cleaned files to the output directory.
The cleaned files must follow the same folder structure as the input files. For example, if the program cleans the file stored at Dataset1/folder6/document265.txt, it must store the cleaned file in CleanedDataset1/folder6/document265.txt, where Dataset1 was the input directory and CleanedDataset1 was the output directory.
The input files are TXT files that contain words separated by separators and by delimiters. In this program, words are defined as any sequence of alphanumerical characters (0-9a-zA-Z). Delimiters are defined as the space, tab and new line characters (\,\t,
,\r
,\r) and any other character is considered a separator.
The cleaning process that your program needs to implement has to abide by the following rules:
any \r character has to be eliminated;
any repeating sequence of delimiters must be replaced with the last delimiter in the sequence. For example, if your program encounters \r
\r
, it must replace it with
, because
was the last character in the delimiter sequence;
any separator must be eliminated. For example, if your program encounters document-01.txt, it must replace it with document01txt, because - and . are separator characters (they are not word characters or delimiter characters);
When the program has finished cleaning a file, the output file should contain words composed out of alphanumerical characters separated by only one delimiter and should not contain any separator characters and no repeating delimiters.
For example, if an input file has the following content:
EBooks posted since November 2003, with etext numbers OVER #10000, are
filed in a different way. The year of a release date is no longer part
of the directory path. The path is based on the etext number (which is
identical to the filename). The path to the file is made up of single
digits corresponding to all but the last digit in the filename. For
example anory and that cleans the files from the input directory and writes the cleaned files to the output directory.
The cleaned files must follow the same folder structure as the input files. For example, if the program cleans the file stored at Dataset1/folder6/document265.txt, it must store the cleaned file in CleanedDataset1/folder6/document265.txt, where Dataset1 was the input directory and CleanedDataset1 was the output directory.
The input files are TXT files that contain words separated by separators and by delimiters. In this program, words are defined as any sequence of alphanumerical characters (0-9a-zA-Z). Delimiters are defined as the space, tab and new line characters (\,\t,
,\r
,\r) and any other character is considered a separator.
CSC435 Distributed Systems I - Winter 20242
Jarvis College of Computing and Digital Media DePaul University
The cleaning process that your program needs to implement has to abide by the following rules:
any \r character has to be eliminated;
any repeating sequence of delimiters must be replaced with the last delimiter in the sequence. For example, if your program encounters \r
\r
, it must replace it with
, because
was the last character in the delimiter sequence;
any separator must be eliminated. For example, if your program encounters document-01.txt, it must replace it with document01txt, because - and . are separator characters (they are not word characters or delimiter characters);
When the program has finished cleaning a file, the output file should contain words composed out of alphanumerical characters separated by only one delimiter and should not contain any separator characters and no repeating delimiters.
For example, if an input file has the following content:
EBooks posted since November 2003, with etext numbers OVER #10000, are
filed in a different way. The year of a release date is no longer part
of the directory path. The path is based on the etext number (which is
identical to the filename). The path to the file is made up of single
digits corresponding to all but the last digit in the filename. For
example an eBook of filename 10234 would be found at:
The output file of the corresponding example should be:
EBooks posted since November 2003 with etext numbers OVER 10000 are
filed in a different way The year of a release date is no longer part
of the directory path The path is based on the etext number which is
identical to the filename The path to the file is made up of single
digits corresponding to all but the last digit in the filename For
example an eBook of filename 10234 would be found at

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

SQL Instant Reference

Authors: Gruber, Martin Gruber

2nd Edition

0782125395, 9780782125399

More Books

Students also viewed these Databases questions