Question
Abstract Write a program in C ++ and Visual Studio to scan one or more text files and count the number of occurrences of each
Abstract
Write a program in C++ and Visual Studio to scan one or more text files and count the number of occurrences of each word in those files USING NODELIST, NOT ARRAYS. Use a binary tree to keep track of all words. Your tree should be self-balancing using either AVL, Splay, or Red-Black tree algorithms to maximize efficiency. When all input files have been scanned, print out the results to another file.
Outcomes
After successfully completing this non-trivial program in C++, you should be able to:
Define a class of objects and operations on those objects.
Build a massive, recursive data structure comprising those objects.
Search for an item in that data structure and, if it is not found, add it to the structure.
Create a file for your output and write to it.
This Assignment
Your program must accept an indeterminate number of arguments on the command line, the first of which specifies the output file and the remaining of which specify the input files. Thus, a user could invoke your program in a Windows Command Prompt by typing
./WordCounter outputFile inputFile1 inputFile2 ...
Under this command, the program would open and read each of the input files in turn, building up a binary tree of words and counts as it progresses. Once all files have been read and closed, it must create the output file and write out the words in the tree in alphabetical order, one word per line, along with the number of occurrences of that word. Your program should ignore the case of the words, so that This and this are considered the same. However, words that are actually spelled differently such as car and cars are considered to be different words.
A sample output might look like the following:
166 a 25 and 11 as 3 command 15 each 2 file 4 files 109 in 4 input 98 it 99 of 3 open 6 program 18 read 152 the 41 this 3 under ------------- 16 Total number of different words
To allow for very long input files, the field width of the number of occurrences of each word should be at least six decimal digits. You should also total and print the number of distinct words.
For fun, you may want to try one of Shakespeares plays, which can be downloaded from the Internet.
Implementation in C++
You must implement this project in an object-oriented style. Think, for example, how you might have done it in Java, and then use that approach as a guideline for your C++ program.
At the very minimum, you should define one or more Class Interface files (header files) and a corresponding Class Implementation (cpp file) file for each Class Interface. In addition, you should have one or more.cpp files for your main() function, input and output, and anything else that is appropriate.
You should base your implementation on the Binary Tree classes we have been developing in class for the past several homework assignments. You will need to extend your Node struct to include the following:
An int containing the count of the number of occurrences of the word.
A pointer to the parent node.
The BinaryTree class will also need to include methods to increment and access the count, add and access the string, and add or access the left and right children and parent nodes. In addition, you should also enhance your BinaryTree class to provide search, insertion, construction, destruction, and traversal methods for your binary tree.
Note: You should NOT use the C++ Standard Template Library templates for trees. You should ignore these for this project. They are likely to make it more difficult and time-consuming..
Note: Dont forget to invoke the destructor of your tree when your program output is complete. This is tantamount to freeing memory in C. Failure to do so can result in memory leaks and will impact your grade.
Additional Notes
Command Line Arguments
Command line arguments are the same in C++ as in C. That is, the function prototype of main() is
int main(int argc, char *argv[]);
The elements of the argv array in this problem are file names. They can be used in the ifstream and ofstream constructors (see below).
Input and Output Streams
The following is from Stroustrup, The C++ Programming Language, 3rd edition, 21.5.1:
#include
...
ofstream output(argv[1]); if (!output) error("Cannot open output file", argv[1]);
This declares and creates an ofstream object called output connected to the file named by the command line argument. It can be used in the same way as cout.
ifstream input(argv[i]); if (!input) error("Cannot open input file", argv[i]);
This declares and creates an ifstream object called input connected to the file named by the ith command line argument. It can be used in the same way as cin.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started