Question

1 Approved Answer

Posted on Oct 15, 2024

Input Description: You will get an input file containing commands, just like in Project 1. The following characters represent the commands: I - Stands for

Input Description: You will get an input file containing commands, just like in Project 1. The following characters represent the commands: I - Stands for Insert. You will be required to insert words in the language model. These words are stored as Strings in the program and are terminated with a space or carriage return (the new line character)more details in the next section. M - Stands for Modify. You must modify your language model by updating the probability valuesmore details in the next section. R - Stands for Remove. You will be required to remove words from your language model. S - Stands for Sentence. You will be required to create sentences from the given input. Input explanation: The input file will have the following layout. For the purpose of explanation, words (a sequence of letters followed by a space) have been replaced by single characters in the example below. I11ab 0.1 I12bcd 0.2 I21efg 0.5 I11af 0.2 M11ab 0.4 R11af I 23 abefg0.5 S2ab The first line, as defined by the command character 'I', requires you to insert words in your language model. The line is interpreted as: There is one word that you have to read ('a'), followed by another word ('b') with a probability of 0.1 (10%). Insert this data in your language model. It says that a sentence with the word 'a' will have the word 'b' with the probability of 0.1. Similarly, the next line, having the command character 'I' is interpreted as: There is one word ('b') followed by two words ('c' and 'd') with a probability of 0.2 (20%). Insert this data in your language model. It says that a sentence with the word 'b' will be followed by words 'c' and 'd' with a probability of 0.2. The third line with the command character 'I' is interpreted as: There are two words ('e' and 'f'), followed by one word ('g') with a probability of 0.5. Insert this data in your language model. The rest of the lines with the command character 'I' are interpreted in a similar way. The line with the command character 'M' requires you to modify the language model. The line is interpreted as: There is one word ('a') followed by another word ('b') with a modified probability of 0.4. Notice that an earlier line had set the probability of 'b' following 'a' as 0.1. You will be required to do this modification in your language model. The line with the command character 'R' requires removing data from the language model. The line is interpreted as: There is a word ('a') followed by another word ('b'). Remove this data from the language model. The line with the command character 'S' requires you to generate a sentence based on the words provided in the input. It is interpreted as: There are two words ('a' and 'b'). Generate a sentence based on the language model created. This command requires you to generate a sentence based on the language model iteratively. First, you will check the word that follows the word 'b' and note its probability. If there is no word that follows 'b' in the language mode, set the probability to 0. If there is a word, say 'x', then note the probability as Pb. Check in the language model if there is is an entry for 'a' and 'b' together. If there is a word that follows it note the word as 'y' and note the probability as Pab. For your sentence, you will append x to the sentence a b to form a b x, if Pb > Pab; otherwise, your sentence will be a b y. You will continue to form the sentence until you can no longer form the sentence (meaning the last words are not present in the language model), or the number of words reaches 20. Class definition: The class TokenProbability stores the data provided by the command character 'I'. #include class TokenProbability { //fields string fromWords; //use the string class string toWords; float probability; //constructors and other methods as necessary } It has three fields, two strings and a float data type to store the probability. The string fromWords will store the first word(s) in the input, and the string toWords will store the second word(s). If the input specifies one word followed by two words, the fromWords string will store the first word, and the toWords string will store the concatenation of the following two words with a space in between. You will use the inbuilt overloaded operator function '+' to do this. Similarly, fromWords may have to store a concatenation of words depending on the input. The class will also be having constructors and getter-setter functions. The second class LanguageModel stores an array of TokenProbability objects. Class LanguageModel { //fields TokenProbability* LLM; //Pointer array //constructors and methods LanguageModel(); insert(); remove(); createSentence(); display(); } //Constructor The class will have as a field a pointer of type TokenProbability to point to an array of TokenProbability objects. Initially, the size of this array is 0, you need to learn to resize as you insert new TokenProbability objects. This array acts like the language model for the project. The functions insert(), remove(), modify(), and createSentence() behave in a similar way as described in the input description (For command characters 'I', 'R', and 'M'). The functions should have suitable error messages if wrong input from the file is encountered. For example, if there is a line in the input file: M 1 1 a c 0.6 and there is no data for the word 'a' followed by word 'c' in the model, then the function should print out an error message: "Error! Data for 'a' followed by 'c' not found!". The method display() will print the language model on the screen.

boilerplate source:

include #include

using namespace std;

class TokenProbability {

//fields protected:

string fromWord; string toWord; float probability;

//constructors and methods not shown here. public: TokenProbability(string a, string b, float prob); TokenProbability(); string getFromWord(); string getToWord(); float getProbability(); void setFromWord(string a); void setToWord(string b); void setProbability(float p); };

class LanguageModel { vector tp; //Pointer to an array of type TokenProbability. public: //constructors and methods LanguageModel(); void insert(string a, string b, float prob); void modify(string a, string b, float prob); void remove(string a, string b); void createSentence(string &result, string currentInput); void display(); };

int main() {

char commandChar; string input; int modelLength; string sentence = ""; //Empty string to append partial results onto. //cin >> modelLength;

LanguageModel *lModel = new LanguageModel();

while(cin >> commandChar) { switch(commandChar){ case 'I':{ //Code for reading data into the model. break; } case 'M':{ //Code for modifying data into the model. break; } case 'R':{ //Code for removing entries from the model. break; } case 'S':{ //Code for creating sentence from given input. break; } } } lModel->display(); return 0; }

input:

I 1 1 The weather 0.8 I 1 1 live in 0.7 I 1 1 Norman Oklahoma 0.9 I 1 1 Boomer Sooner 0.95 I 1 2 in Norman Oklahoma 0.8 I 2 1 Boomer Sooner OU 0.6 I 2 1 live in Norman 0.75 I 1 1 cheer for 0.8 I 2 1 cheer for OU 0.9 I 1 1 I cheer 0.8 I 1 1 I live 0.75 I 1 1 weather in 0.5 I 1 1 in USA 0.7 I 1 2 USA is good 0.55

S 2 I live S 1 I S 2 The weather

M 1 1 in USA 0.95 M 1 1 Sooner OU 0.8 M 1 1 USA is 0.45

I 1 2 Oklahoma is good 0.7 I 2 1 cheer for Sooners 0.95

R 1 2 USA is good R 1 1 Oklahoma USA R 1 1 The weather R 1 1 USA in 0.5

S 2 I live S 1 I S 2 The weather

Ouput:

Inserting: The : weather 0.8 Inserting: live : in 0.7 Inserting: Norman : Oklahoma 0.9 Inserting: Boomer : Sooner 0.95 Inserting: in : Norman Oklahoma 0.8 Inserting: Boomer Sooner : OU 0.6 Inserting: live in : Norman 0.75 Inserting: cheer : for 0.8 Inserting: cheer for : OU 0.9 Inserting: I : cheer 0.8 Inserting: I : live 0.75 Inserting: weather : in 0.5 Inserting: in : USA 0.7 Inserting: USA : is good 0.55 Input: I live Output: I live in Norman Oklahoma Input: I Output: I cheer for OU Input: The weather Output: The weather in Norman Oklahoma

Error: Sooner : OU not present to modify.

Error: USA : is not present to modify. Inserting: Oklahoma : is good 0.7 Inserting: cheer for : Sooners 0.95

Error: Oklahoma : USA not present to remove.

Error: USA : in not present to remove. Input: I live Output: I live in USA Input: I Output: I cheer for Sooners Input: The weather Output: The weather in USA Printing the Language Model: live : in : 0.7 Norman : Oklahoma : 0.9 Boomer : Sooner : 0.95 in : Norman Oklahoma : 0.8 Boomer Sooner : OU : 0.6 live in : Norman : 0.75 cheer : for : 0.8 cheer for : OU : 0.9 I : cheer : 0.8 I : live : 0.75 weather : in : 0.5 in : USA : 0.95 Oklahoma : is good : 0.7 cheer for : Sooners : 0.95