Question
Please implement the following four funtions 1. sentenceSplitter. bool sentenceSplitter( string& fname, vector & sentences); This function converts a text file with the name fname
Please implement the following four funtions
1. sentenceSplitter.
bool sentenceSplitter( string& fname, vector
This function converts a text file with the name fname into a list of sentences. The list of sentences will be stored in the sentences vector in the same order as it appears in the input file. This function returns true if it is successful; false otherwise.
What will be considered as sentence delimiters? Given a paragraph of multiple sentences, the following punctuations will be used to split this paragraph into individual sentences
period: .,
question mark: ?
period + double quotation mark: ."
question mark + double quotation mark: ?"
Assume the input file contains the following three paragraphs:
The first story is about connecting the dots.
I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?
It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him?" They said: "Of course." My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school. She refused to sign the final adoption papers. She only relented a few months later when my parents promised that I would someday go to college.
The above function will identify a total of 12 sentences as follows:
- The first story is about connecting the dots
- I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit
- So why did I drop out
- It started before I was born
- My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption
- She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife
- Except that when I popped out they decided at the last minute that they really wanted a girl
- So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him
- They said: "Of course
- My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school
- She refused to sign the final adoption papers
- She only relented a few months later when my parents promised that I would someday go to college
2. identify unique word-pairs and calculate their frequencies.
bool wordpairMapping( vector
Given a list of sentences stored in the first argument sentences, this function identifies all the all the unique word-pairs and each word-pair's frequency. The identified (word-pair, frequency)'s will be stored into wordpariFreq_map, which is a map of (key, value) pairs. The key of this map a word-pair and the value is the frequency of this word-pair. This function will return true if the mapping is successful; false otherwise.
Note that
Tokens are case insensitive. We will consider lower case in this project
The two words in a word-pair are different. For example, event though the first sentence above contains two the, you are not going to construct a word pair
Order does not matter between two words in a word-pair. For example, the word-pair
Suggestions:
Use istringstream to tokenize a sentence.
Use set to store all the unique tokens identified in a sentence.
Assume sentences consists of the following 3 sentences:
The first story is about connecting the dots.
The first story is about connecting the dots.
The first story is about connecting the dots.
This function is going to identify a total of 21 word-pairs as follows:
3. flip the map of to a multimap of to order all the word-pairs in ascending order of frequency
bool freqWordpairMmap(map< pair
This function flips the wordpairFreq_map such that frequencies will be the keys and word-pairs will be the values. A multimap will be needed as two word-pairs can have the same frequency. This function will return true if the flipping is successful; false otherwise.
4. output the most frequent and least frequent word-pairs to a file.
void printWordpairs(multimap
This function writes the top topCnt most frequent word-pairs and botCnt least frequent word-pairs to a file of the name outFname. Note that all the word-pairs are already ordered in descending order of frequency. You are going to simply use multimap's iterator and revserse_iterator to access the most frequent and least frequent word-pairs. The output will be one word-pair per line in the format of
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started