Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I'm trying to create a C++ program that can generate the statistics of any large text. That can calculate which word is frequently in the

I'm trying to create a C++ program that can generate the statistics of any large text. That can calculate which word is frequently in the text. This is the introctuction, it doesnt have to open the file Corpus_cleaner because i will do that part but the program should open any file. so please deregard the Corpus_Cleaner name.

Statistics analyzer (plaintext):

Please create a program called sol_sap.ext, where ext denotes the file extension corresponding to your choice of programming language (.py, .c, .cpp, .c++, or .java).

This program should open and read in a file named corpus_clean.txt, which is the output from sol_cleaner.ext. As output it should produce a file called corpus_freq.txt, which contains the following. Each row is a pair ,

where letter is a character occurring in the text and rel_freq is the relative frequency of the letter in the corpus. The relative frequency of a letter c is defined by:

relative frequency of c = #occurrences of c in corpus #letters in corpus

Note that this will be a floating point number.

The letter/frequency pairs should be given in order of descending frequency. For example, the file should roughly look like this.

e, 0.082198

a, 0.050031 (etc)

z, 0.003000

I have made up the values in the above table for the sake of example.

It is possible to implement this algorithm in a way that only makes one pass over the corpus. However, if you prefer to read through the file 27 times, that is feasible.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases In Networked Information Systems 6th International Workshop Dnis 2010 Aizu Wakamatsu Japan March 2010 Proceedings Lncs 5999

Authors: Shinji Kikuchi ,Shelly Sachdeva ,Subhash Bhalla

2010th Edition

3642120377, 978-3642120374

More Books

Students also viewed these Databases questions