Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In this assignment you are required to create a text parser in Java / C + + . Given a input text file you need
In this assignment you are required to create a text parser in JavaC Given a input text file you need to parse it and answer a set of frequency related questions.
Technical Requirement of Solution:
You are required to do this ab initio barebones from scratch This means, your solution cannot use any library methods in Java except the ones listed below or equivalent library functions in C
String.split and other String operations can be used wherever required.
You can use any Regular Expression related facilities javautil.regex to match target words and phrases.
You are also allowed to use different variants of array and list based builtin data structures such as Array, List, ArrayList, Vector.Not Hashmap
Standard file IO facilities for readingwriting such as BufferedReader.
Create as many files, intermediate arraybased data structures as you wish, and allocate as much heap memory from JVM as you need. BUT you are allowed to read the input file EXACTLY ONCE to answer ALL the questions. You however can use any internal array based representation of the whole file to do multiple rounds of processing if needed after having read it EXACTLY ONCE.
Suggested programming language Java. However considering this is the very first assignment, you can use CC provided similar constructs and rules are followed and no library functions that leverage hash and maps are used.
For the following questions, list all matching output if there are ties
List the most frequent words in the whole file and its frequency.
List the rd most frequent words in the whole file and its frequency.
List the words with the highest frequency in a sentence across all sentences in the whole file, also print its frequency and the corresponding sentence.
List sentences with the maximum no of occurrences of the word "the" in the entire file and also list the corresponding frequency.
List sentences with the maximum no of occurrences of the word of in the entire file and also list the corresponding frequency.
List sentences with the maximum no of occurrences of the word "was" in the entire file and also list the corresponding frequency.
List sentences with the maximum no of occurrences of the phrase "but the" in the entire file and also list the corresponding frequency.
List sentences with the maximum no of occurrences of the phrase it was" in the entire file and also list the corresponding frequency.
List sentences with the maximum no of occurrences of the phrase in my in the entire file and also list the corresponding frequency. The program has two arguments:
The first argument: path to the input text file.
The second argument: name prefix for the output files
For example:
$ java Assignmentinputtxt "output"
input file: A text document. Assume each newline
defines a paragraph. Each period defines end of a sentence. Or if a sentence is the last in a paragraph and doesnt have an explicit period its end marker is the same as a newline. Each space within a sentence character define the word delimiter. The assignment is caseinsensitive, so you must transform and work in lower case.
Output: Download the data.zip from Blackboard for sample input and answer files. Your output must conform to the following specifications. For each of the questions you must create one single output file. So your program should produce output files each time you run it If a question has multiple output multiple sentenceswords you should print each sentence in a new line. Do not print them on the same line! The order of the sentenceswordsphrases is not important. However, the order of the output file name must be matching the order of the questions. For example, given prefix output the output file of the first question should be outputtxt and for the second question it is outputtxt The output format depends on the question type, and it must be:
For question and :
word:frequency eg
the:
For question :
word:frequency:sentence eg
the::you see watson he explained in the early hours of the morning...
For question :
word:frequency:sentence eg
was::then it was withdrawn as suddenly as it appeared...
was::the a week was a lure which must draw him and...
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started