Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Coding lab: File I/Os and STLs: identifying unique word-pairs and their frequencies in a textfile Note 1. This project is similar to the first problem

Coding lab: File I/Os and STLs: identifying unique word-pairs and their frequencies in a textfile

Note 1. This project is similar to the first problem in Assignment 11.

Note 2. The instructor is currently setting up all the unit tests. This lab will be worth a total of 60 points.

Project description:

The attached file, SteveJobsSpeech2005.txt, contains the commencement speech delivered by Steve Jobs in 2005 at Stanford University. (A copy of this file is also uploaded to iLearn. You can download it to test your program locally before submitting to zyBook. ) In this project, you are going to process this file to identify all the co-occurring word-pairs and their frequencies. Two words are said to co-occur if they appear in the same sentence. We will refer to a pair of co-occurring words as word-pairs. The frequency of a word-pair is defined as the number of sentences that consist of this word-pair. You are required to include three files in this project:

fileIOs_wordPairs.h: see its content below.

fileIOs_wordPairs.cpp: implement the functions declared in the above header file

fileIOswordPairsmain.cpp: test all the functions implemented

Header file: fileIOs_wordPairs.h

You are going to include the following function prototypes in this header file (but please feel free to introduce other helper routines if you see necessary) :

1. sentenceSplitter.

bool sentenceSplitter( string& fname, vector& sentences); 

This function converts a text file with the name fname into a list of sentences. The list of sentences will be stored in the sentences vector in the same order as it appears in the input file. This function returns true if it is successful; false otherwise.

What will be considered as sentence delimiters? Given a paragraph of multiple sentences, the following punctuations will be used to split this paragraph into individual sentences

period: .,

question mark: ?

period + double quotation mark: ."

question mark + double quotation mark: ?"

Assume the input file contains the following three paragraphs:

The first story is about connecting the dots. I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out? It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him?" They said: "Of course." My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school. She refused to sign the final adoption papers. She only relented a few months later when my parents promised that I would someday go to college. 

The above function will identify a total of 12 sentences as follows:

- The first story is about connecting the dots - I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit - So why did I drop out - It started before I was born - My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption - She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife - Except that when I popped out they decided at the last minute that they really wanted a girl - So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him - They said: "Of course - My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school - She refused to sign the final adoption papers - She only relented a few months later when my parents promised that I would someday go to college 

2. identify unique word-pairs and calculate their frequencies.

bool wordpairMapping( vector& sentences, map< pair, int> &wordpairFreq_map); 

Given a list of sentences stored in the first argument sentences, this function identifies all the all the unique word-pairs and each word-pair's frequency. The identified (word-pair, frequency)'s will be stored into wordpariFreq_map, which is a map of (key, value) pairs. The key of this map a word-pair and the value is the frequency of this word-pair. This function will return true if the mapping is successful; false otherwise.

Note that

Tokens are case insensitive. We will consider lower case in this project

The two words in a word-pair are different. For example, event though the first sentence above contains two the, you are not going to construct a word pair

Order does not matter between two words in a word-pair. For example, the word-pair is the same as . You are recommended to arrange the two words in lexicographical order before inserting the pair into the map.

Suggestions:

Use istringstream to tokenize a sentence.

Use set to store all the unique tokens identified in a sentence.

Assume sentences consists of the following 3 sentences:

The first story is about connecting the dots. The first story is about connecting the dots. The first story is about connecting the dots. 

This function is going to identify a total of 21 word-pairs as follows:

: 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 

3. flip the map of to a multimap of to order all the word-pairs in ascending order of frequency

bool freqWordpairMmap(map< pair, int> &wordpairFreq_map, multimap > &freqWordpair_mmap ); 

This function flips the wordpairFreq_map such that frequencies will be the keys and word-pairs will be the values. A multimap will be needed as two word-pairs can have the same frequency. This function will return true if the flipping is successful; false otherwise.

4. output the most frequent and least frequent word-pairs to a file.

void printWordpairs(multimap > &freqWordpair_multimap, string outFname, int topCnt, int botCnt); 

This function writes the top topCnt most frequent word-pairs and botCnt least frequent word-pairs to a file of the name outFname. Note that all the word-pairs are already ordered in descending order of frequency. You are going to simply use multimap's iterator and revserse_iterator to access the most frequent and least frequent word-pairs. The output will be one word-pair per line in the format of : frequency. For example:

: 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 : 3 

Implementation file: fileIOs_wordPairs.cpp

In this program file, you are going implement all the functions declared in the above header file.

Test driver: fileIOswordPairsmain.cpp

You are going to include a main() function to test all the above four functions.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2019 Wurzburg Germany September 16 20 2019 Proceedings Part 2 Lnai 11907

Authors: Ulf Brefeld ,Elisa Fromont ,Andreas Hotho ,Arno Knobbe ,Marloes Maathuis ,Celine Robardet

1st Edition

3030461467, 978-3030461461

More Books

Students also viewed these Databases questions