Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

TextStatistics will be the class that reads a text file, parses it, and stores the information about the words and characters in the file. ProcessText

TextStatistics will be the class that reads a text file, parses it, and stores the information about the words and characters in the file. ProcessText is the driver class that gets a list of one or more filenames from the command line and collects statistics on each of the files using an instance of the TextStatistics object.

Classes that you will create: ProcessText.java, TextStatistics.java (TextStatistics will implement TextStatisticsInteraface.java)

Existing interface and class that you will use: TextStatisticsInterface.java, TextStatisticsTest.java (Code given below)

ProcessText.java

The driver class with the main method which processes one or more files to determine some interesting statistics about them.

Command-line validation

The names of the files to process will be given as command line arguments. Make sure to validate the number of command line arguments. There should be at least one file name given.

If no files are given on the command line, your program must print a usage message and exit the program immediately. The message should read as follows.

Usage: java ProcessText file1 [file2 ...] 

This lets the user know how they should run the program without having to go look up the documentation.

Processing command-line arguments

If valid filenames are given on the command line, your program will process each command line argument by creating a File object from it and checking to see that the file actually exists.

If a file does exist, your program will create a TextStatistics object for that file and print out the statistics for the file to the console.

If a file does not exist, a meaningful error message needs to be printed to the user. Continue processing the next file. An invalid file in the list should not result in the program crashing or exiting before all files have been processed.

TextStatistics.java

An instantiable class that reads a given text file, parses it, and stores the generated statistics.

Implement the Interface

Your TextStatistics class must implement the given TextStatisticsInterface (don't modify the interface, it just provides a list of methods that your class must include).

To implement an interface, you must modify your class header as follows

public class TextStatistics implements TextStatisticsInterface

{

}

Adding "implements TextStatisticsInterface" will cause an error in Eclipse. Select the quick fix option to "Add unimplemented methods" and it will stub out the required methods for you.

Instance variables

Include a reference to the processed File. Include variables for all of the statistics that are computed for the file. Look at the list of accessor methods in the TextStatisticsInterface to determine which statistics will be stored.

Constructor

Takes a File object as a parameter. The constructor should open the file and read the entire file line-by-line, processing each line as it reads it.

You should only have to read through each file once if you are doing this program properly. By the end of the constructor, the TextStatistics object should have collected all of its statistics and calls to its accessor methods will simply return the stored values.

Your constructor needs to handle the FileNotFoundException that can occur when the File is opened in a Scanner. Use a try-catch statement to do this. Don't just throw the exception.

As each line is read, collect the

following statistics:

The number of characters and lines in the file. The number of characters should include all whitespace characters, punctuation, etc. The number of lines should include any blank lines in the file.

The number of words in the file.

You must use a Scanner on each line to count the number of words in each line of the text file.

To ensure everyone's results are consistent, you must use the exact delimiter given below rather than making up your own.

private static final String DELIMITERS = "[\\W\\d_]+";

Use useDelimiter(DELIMITERS) on your line Scanner to set the delimiters that the Scanner will use for separating words in the file.

Getter (accessor) methods

Implement the accessor methods for the number of characters, number of words, number of lines, average word length and for the arrays that contain the number of words of each length and the number of times each letter occurs in the file.

If you implemented the interface correctly, you should already have methods for these. Just make sure they are returning the correct values. If you are doing your program correctly, most (if not all) of your accessor methods will just contain a single return statement.

toString method

Write a toString() method that generates and returns a String that can be printed to summarize the statistics for the file as shown in the sample output.

TextStatisticsInterface.java:

/**

* Interface to get statistics from a text file. Used in the testing program.

* @author CS121 Instructors

*/

public interface TextStatisticsInterface

{

/**

* @return the number of characters in the text file

*/

public int getCharCount();

/**

* @return the number of words in the text file

*/

public int getWordCount();

/**

* @return the number of lines in the text file

*/

public int getLineCount();

/**

* @return the letterCount array with locations [0]..[25] for 'a' through 'z'

*/

public int[] getLetterCount();

/**

* @return the wordLengthCount array with locations [0]..[23] with location [i]

* storing the number of words of length i in the text file. Location [0] is not used.

* Location [23] holds the count of words of length 23 and higher.

*/

public int[] getWordLengthCount();

/**

* @return the average word length in the text file

*/

public double getAverageWordLength();

}

TextStatisticsTest.java:

import java.io.File;

import java.util.Arrays;

/**

* Simple unit tester for the TextStatistics class.

* @author amit

*

*/

public class TextStatisticsTest

{

private final static int PRECISION = 2; //number of digits after floating point to match

/**

* Compares two doubles to see if they are equal to within the given precision

* @param x

* @param y

* @param precision number of digits after the decimal point to use in testing equality

* @return

*/

private static boolean approxEquals(double x, double y, int precision) {

final double EPSILON = Math.pow(10, -precision);

if (Math.abs(x - y) < EPSILON)

return true;

else

return false;

}

/**

* Test given TextStatistics object with given expected results.

* @param stats The TextStatistics object to test

* @param numChars number of characters

* @param numWords number of words

* @param numLines number of lines

* @param avgWordLength average word length

* @param wordFreq array of word frequencies [0..23]

* @param letterFreq array of letter frequencies [0..25]

*/

private static void test(TextStatisticsInterface stats,

int numChars,

int numWords,

int numLines,

double avgWordLength,

int[] wordFreq,

int[] letterFreq)

{

if (stats.getCharCount() == numChars){

System.out.println("Passed! getCharCount()");

} else {

System.out.println("----> Failed ! getCharCount() correct: " + numChars + " generated: " + stats.getCharCount());

}

if (stats.getWordCount() == numWords) {

System.out.println("Passed! getWordCount()");

} else {

System.out.println("----> Failed ! getWordCount() correct: " + numWords + " generated: " + stats.getWordCount());

}

if (stats.getLineCount() == numLines) {

System.out.println("Passed! getLineCount()");

} else {

System.out.println("----> Failed ! getLineCount() correct: " + numLines + " generated: " + stats.getLineCount());

}

if (approxEquals(stats.getAverageWordLength(), avgWordLength, PRECISION)) {

System.out.println("Passed! getAverageWordLength()");

} else {

System.out.println("----> Failed ! getAverageWordLength() correct: " + avgWordLength + " generated: " + stats.getAverageWordLength());

}

int [] testWordFreq = stats.getWordLengthCount();

if (Arrays.equals(testWordFreq, wordFreq)) {

System.out.println("Passed! Word length frequencies");

} else {

System.out.println(" ----> Failed ! Word length frequencies " +

" correct: " + Arrays.toString(wordFreq) + " " +

" generated: " + Arrays.toString(testWordFreq) + " ");

}

int[] testLetterFreq = stats.getLetterCount();

if (Arrays.equals(testLetterFreq, letterFreq)) {

System.out.println("Passed! Letter frequencies");

} else {

System.out.println(" ----> Failed ! Letter frequencies " +

" correct: " + Arrays.toString(letterFreq) + " " +

" generated: " + Arrays.toString(testLetterFreq) +" ");

}

System.out.println();

}

/**

* Test over a list of predefined files.

* @param args

*/

public static void main(String[] args)

{

String etext = "etext";

if(args.length == 1) {

etext = args[0];

}

// expected results

String [] textfile = {etext + File.separator + "testfile.txt",

etext + File.separator + "Gettysburg-Address.txt",

etext + File.separator + "Alice-in-Wonderland.txt"};

int[][] wordFreq = {{0, 3, 13, 24, 13, 10, 2, 5, 3, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},

{0, 8, 50, 55, 61, 35, 27, 17, 7, 10, 6, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},

{0, 1705, 4412, 7062, 5782, 3340, 1951, 1569, 723, 448, 181, 108, 34, 11, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0}

};

int[][] letterFreq = {{27, 1, 11, 10, 33, 9, 7, 24, 25, 0, 2, 18, 5, 25, 26, 5, 0, 21, 30, 35, 7, 1, 10, 1, 2, 0},

{107, 18, 32, 61, 175, 28, 33, 81, 74, 0, 3, 47, 14, 86, 96, 17, 1, 84, 53, 132, 25, 27, 28, 0, 13, 0},

{8787, 1474, 2397, 4931, 13569, 2000, 2528, 7372, 7511, 146, 1158, 4713, 2107, 7013, 8141, 1522, 209, 5433,

6495, 10684, 3468, 845, 2674, 148, 2264, 78}

};

int[] numChars = {465, 1622, 148482};

int[] numWords = {79, 281, 27331};

int[] numLines = {11, 39, 3610};

double[] avgWordLength = {4.24, 4.40, 3.94};

for (int i = 0; i < textfile.length; i++) {

File nextFile = new File(textfile[i]);

if (nextFile.exists() && nextFile.canRead()) {

System.out.println(" Testing on data file:" + textfile[i] + " ");

TextStatisticsInterface stats = new TextStatistics(nextFile);

test(stats, numChars[i], numWords[i], numLines[i],

avgWordLength[i], wordFreq[i], letterFreq[i]);

} else {

System.err.println("Cannot access test file: " + textfile[i]);

}

}

}

}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database And Expert Systems Applications 19th International Conference Dexa 2008 Turin Italy September 2008 Proceedings Lncs 5181

Authors: Sourav S. Bhowmick ,Josef Kung ,Roland Wagner

2008th Edition

3540856536, 978-3540856535

More Books

Students also viewed these Databases questions

Question

2. Identify and choose outcomes to evaluate a training program.

Answered: 1 week ago

Question

6. Conduct a cost-benefit analysis for a training program.

Answered: 1 week ago