Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Abstract Write a program in C ++ and Visual Studio to scan one or more text files and count the number of occurrences of each

Abstract

Write a program in C++ and Visual Studio to scan one or more text files and count the number of occurrences of each word in those files USING NODELIST, NOT ARRAYS. Use a binary tree to keep track of all words. Your tree should be self-balancing using either AVL, Splay, or Red-Black tree algorithms to maximize efficiency. When all input files have been scanned, print out the results to another file.

Outcomes

After successfully completing this non-trivial program in C++, you should be able to:

Define a class of objects and operations on those objects.

Build a massive, recursive data structure comprising those objects.

Search for an item in that data structure and, if it is not found, add it to the structure.

Create a file for your output and write to it.

This Assignment

Your program must accept an indeterminate number of arguments on the command line, the first of which specifies the output file and the remaining of which specify the input files. Thus, a user could invoke your program in a Windows Command Prompt by typing

./WordCounter outputFile inputFile1 inputFile2 ...

Under this command, the program would open and read each of the input files in turn, building up a binary tree of words and counts as it progresses. Once all files have been read and closed, it must create the output file and write out the words in the tree in alphabetical order, one word per line, along with the number of occurrences of that word. Your program should ignore the case of the words, so that This and this are considered the same. However, words that are actually spelled differently such as car and cars are considered to be different words.

A sample output might look like the following:

166 a 25 and 11 as 3 command 15 each 2 file 4 files 109 in 4 input 98 it 99 of 3 open 6 program 18 read 152 the 41 this 3 under ------------- 16 Total number of different words

To allow for very long input files, the field width of the number of occurrences of each word should be at least six decimal digits. You should also total and print the number of distinct words.

For fun, you may want to try one of Shakespeares plays, which can be downloaded from the Internet.

Implementation in C++

You must implement this project in an object-oriented style. Think, for example, how you might have done it in Java, and then use that approach as a guideline for your C++ program.

At the very minimum, you should define one or more Class Interface files (header files) and a corresponding Class Implementation (cpp file) file for each Class Interface. In addition, you should have one or more.cpp files for your main() function, input and output, and anything else that is appropriate.

You should base your implementation on the Binary Tree classes we have been developing in class for the past several homework assignments. You will need to extend your Node struct to include the following:

An int containing the count of the number of occurrences of the word.

A pointer to the parent node.

The BinaryTree class will also need to include methods to increment and access the count, add and access the string, and add or access the left and right children and parent nodes. In addition, you should also enhance your BinaryTree class to provide search, insertion, construction, destruction, and traversal methods for your binary tree.

Note: You should NOT use the C++ Standard Template Library templates for trees. You should ignore these for this project. They are likely to make it more difficult and time-consuming..

Note: Dont forget to invoke the destructor of your tree when your program output is complete. This is tantamount to freeing memory in C. Failure to do so can result in memory leaks and will impact your grade.

Additional Notes

Command Line Arguments

Command line arguments are the same in C++ as in C. That is, the function prototype of main() is

int main(int argc, char *argv[]);

The elements of the argv array in this problem are file names. They can be used in the ifstream and ofstream constructors (see below).

Input and Output Streams

The following is from Stroustrup, The C++ Programming Language, 3rd edition, 21.5.1:

#include

...

ofstream output(argv[1]); if (!output) error("Cannot open output file", argv[1]);

This declares and creates an ofstream object called output connected to the file named by the command line argument. It can be used in the same way as cout.

ifstream input(argv[i]); if (!input) error("Cannot open input file", argv[i]);

This declares and creates an ifstream object called input connected to the file named by the ith command line argument. It can be used in the same way as cin.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

What is job rotation ?

Answered: 1 week ago