Question
I'm trying to create a C++ program that can generate the statistics of any large text. That can calculate which word is frequently in the
I'm trying to create a C++ program that can generate the statistics of any large text. That can calculate which word is frequently in the text. This is the introctuction, it doesnt have to open the file Corpus_cleaner because i will do that part but the program should open any file. so please deregard the Corpus_Cleaner name.
Statistics analyzer (plaintext):
Please create a program called sol_sap.ext, where ext denotes the file extension corresponding to your choice of programming language (.py, .c, .cpp, .c++, or .java).
This program should open and read in a file named corpus_clean.txt, which is the output from sol_cleaner.ext. As output it should produce a file called corpus_freq.txt, which contains the following. Each row is a pair ,
where letter is a character occurring in the text and rel_freq is the relative frequency of the letter in the corpus. The relative frequency of a letter c is defined by:
relative frequency of c = #occurrences of c in corpus #letters in corpus
Note that this will be a floating point number.
The letter/frequency pairs should be given in order of descending frequency. For example, the file should roughly look like this.
e, 0.082198
a, 0.050031 (etc)
z, 0.003000
I have made up the values in the above table for the sake of example.
It is possible to implement this algorithm in a way that only makes one pass over the corpus. However, if you prefer to read through the file 27 times, that is feasible.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started