Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please implement the following four funtions 1. sentenceSplitter. bool sentenceSplitter( string& fname, vector & sentences); This function converts a text file with the name fname

Please implement the following four funtions

1. sentenceSplitter.

bool sentenceSplitter( string& fname, vector& sentences);

This function converts a text file with the name fname into a list of sentences. The list of sentences will be stored in the sentences vector in the same order as it appears in the input file. This function returns true if it is successful; false otherwise.

What will be considered as sentence delimiters? Given a paragraph of multiple sentences, the following punctuations will be used to split this paragraph into individual sentences

period: .,

question mark: ?

period + double quotation mark: ."

question mark + double quotation mark: ?"

Assume the input file contains the following three paragraphs:

The first story is about connecting the dots.

I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?

It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him?" They said: "Of course." My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school. She refused to sign the final adoption papers. She only relented a few months later when my parents promised that I would someday go to college.

The above function will identify a total of 12 sentences as follows:

- The first story is about connecting the dots

- I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit

- So why did I drop out

- It started before I was born

- My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption

- She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife

- Except that when I popped out they decided at the last minute that they really wanted a girl

- So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him

- They said: "Of course

- My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school

- She refused to sign the final adoption papers

- She only relented a few months later when my parents promised that I would someday go to college

2. identify unique word-pairs and calculate their frequencies.

bool wordpairMapping( vector& sentences, map< pair, int> &wordpairFreq_map);

Given a list of sentences stored in the first argument sentences, this function identifies all the all the unique word-pairs and each word-pair's frequency. The identified (word-pair, frequency)'s will be stored into wordpariFreq_map, which is a map of (key, value) pairs. The key of this map a word-pair and the value is the frequency of this word-pair. This function will return true if the mapping is successful; false otherwise.

Note that

Tokens are case insensitive. We will consider lower case in this project

The two words in a word-pair are different. For example, event though the first sentence above contains two the, you are not going to construct a word pair

Order does not matter between two words in a word-pair. For example, the word-pair is the same as . You are recommended to arrange the two words in lexicographical order before inserting the pair into the map.

Suggestions:

Use istringstream to tokenize a sentence.

Use set to store all the unique tokens identified in a sentence.

Assume sentences consists of the following 3 sentences:

The first story is about connecting the dots.

The first story is about connecting the dots.

The first story is about connecting the dots.

This function is going to identify a total of 21 word-pairs as follows:

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

: 3

3. flip the map of to a multimap of to order all the word-pairs in ascending order of frequency

bool freqWordpairMmap(map< pair, int> &wordpairFreq_map, multimap > &freqWordpair_mmap );

This function flips the wordpairFreq_map such that frequencies will be the keys and word-pairs will be the values. A multimap will be needed as two word-pairs can have the same frequency. This function will return true if the flipping is successful; false otherwise.

4. output the most frequent and least frequent word-pairs to a file.

void printWordpairs(multimap > &freqWordpair_multimap, string outFname, int topCnt, int botCnt);

This function writes the top topCnt most frequent word-pairs and botCnt least frequent word-pairs to a file of the name outFname. Note that all the word-pairs are already ordered in descending order of frequency. You are going to simply use multimap's iterator and revserse_iterator to access the most frequent and least frequent word-pairs. The output will be one word-pair per line in the format of : frequency.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Illuminated

Authors: Catherine M. Ricardo, Susan D. Urban, Karen C. Davis

4th Edition

1284231585, 978-1284231588

More Books

Students also viewed these Databases questions

Question

3. Prove statement (10).

Answered: 1 week ago

Question

Understand how people development is used to retain talent.

Answered: 1 week ago