Question
You will build your language model from a given set of example texts. As the model is based on trigram counts, you must count how
You will build your language model from a given set of example texts. As the model is based on trigram counts, you must count how many times triples of consecutive words appear in each example text. Words should be treated case-sensitively, meaning "she" and "She" should be considered two different words. And, although the example texts may contain punctuation, you should not treat it specially. That is, if the file contains the phrase "he, she, I", then you can consider the first word as "he,", the second as "she," and the third as "I". Said another way, process your example files as if they contained no punctuation, and consider the two words "she" and "she," as two different words.
You must write a C++ program which when built, creates an executable file named hw7a that takes two command-line arguments. The first argument is the name of a text file containing a list of input filenames.
In order to treat the beginning and end of your example files meaningfully during Part B, you will include in the model you create in Part A the special words "
d, "
And you will need to add four similar trigrams for each example text that you process.
Each time your program is run, it should build your trigram-based language model by processing each text file specified in the input filename list. What happens after that will depend on the second argument specified at the command line. The second argument is a single letter, and should be one of "a", "r", or "c". Your program should output to the C++ standard output stream (cout) the language model you created, ordering entries as specified by the argument letter as follows:
a - forward alphabetical order. This means that trigrams are output in alphabetical order by the first word in each trigram, using the alphabetical order of the second and then third word in each trigram to break ties.
r - reverse alphabetical order. This means that trigrams are output in descending alphabetical order by the first word in each trigram, using the descending alphabetical order of the second and then third word in each trigram to break ties.
c - count order. The means that trigrams are output in ascending order by frequency, using forward alphabetical ordering of first words and then second and then third words to break ties.
Your output will consist of one trigram with associated count per line. On a given line, the 4 outputs (trigramWord1, trigramWord2, trigramWord3, and count) should be separated by single spaces.
Example
Suppose the list of training texts input for your program resides in a file named tiny_ex.txt, and the contents of the file are names of text files containing excerpts from Dr. Seuss books as follows (click the links to see the contents of the two text files): sl.txt
ge.txt
For the command ./hw7a tiny_ex.txt a, the expected output is:
Clause.
I do not 2
Santa Clause.
a lot about 1
about flaws. theyve 1
about gauze. theyve 1
about laws and 1
about old Santa 1
about paws and 1
and theyve talked 2
anywhere
do not like 2
flaws. theyve talked 1
gauze. theyve talked 1
here or there 1
laws and theyve 1
like them anywhere 1
like them here 1
lot about old 1
not like them 2
old Santa Clause. 1
or there I 1
paws and theyve 1
quite a lot 1
talked about flaws. 1
talked about gauze. 1
talked about laws 1
talked about paws 1
talked quite a 1
them anywhere
them here or 1
there I do 1
theyve talked about 4
theyve talked quite 1
For the command ./hw7a tiny_ex.txt c, the expected output is:
Clause.
Santa Clause.
a lot about 1
about flaws. theyve 1
about gauze. theyve 1
about laws and 1
about old Santa 1
about paws and 1
anywhere
flaws. theyve talked 1
gauze. theyve talked 1
here or there 1
laws and theyve 1
like them anywhere 1
like them here 1
lot about old 1
old Santa Clause. 1
or there I 1
paws and theyve 1
quite a lot 1
talked about flaws. 1
talked about gauze. 1
talked about laws 1
talked about paws 1
talked quite a 1
them anywhere
them here or 1
there I do 1
theyve talked quite 1
I do not 2
and theyve talked 2
do not like 2
not like them 2
theyve talked about 4
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started