Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this programming assignment, you will be building a lexical analyzer for small programming language, called Simple Perl-Like (SPL), and a program to test it.

In this programming assignment, you will be building a lexical analyzer for small programming language, called Simple Perl-Like (SPL), and a program to test it. This assignment will be followed by two other assignments to build a parser and an interpreter to the SPL language. Although, we are not concerned about the syntax definitions of the language in this assignment, we intend to introduce it ahead of Programming Assignment 2 in order to determine the language terminals: reserved words, constants, identifier(s), and operators. The syntax definitions of the SPL language are given below using EBNF notations. However, the details of the meanings (i.e. semantics) of the language constructs will be given later on. 1. Prog ::= StmtList 2. StmtList ::= Stmt ;{ Stmt; } 3. Stmt ::= AssignStme | WriteLnStmt | IfStmt 4. WriteLnStmt ::= WRITELN (ExprList) 5. IfStmt ::= IF (Expr) { StmtList } [ ELSE { StmtList } ] 6. AssignStmt ::= Var = Expr 7. Var ::= NIDENT | SIDENT 8. ExprList ::= Expr { , Expr } 9. Expr ::= RelExpr [(-eq|==) RelExpr ] 10. RelExpr ::= AddExpr [ ( -lt | -gt | < | > ) AddExpr ] 11. AddExpr :: MultExpr { ( + | - | .) MultExpr } 12. MultExpr ::= ExponExpr { ( * | / | **) ExponExpr } 13. ExponExpr ::= UnaryExpr { ^ UnaryExpr } 14. UnaryExpr ::= [( - | + )] PrimaryExpr 15. PrimaryExpr ::= IDENT | SIDENT | NIDENT | ICONST | RCONST | SCONST | (Expr)

Lexical Analyzer Requirements:

A header file, lex.h, is provided for you. It contains the definitions of the LexItem class, and an enumerated type of token symbols, called Token, and the definitions of three functions to be implemented. These are: extern ostream& operator<<(ostream& out, const LexItem& tok); extern LexItem id_or_kw(const string& lexeme, int linenum); extern LexItem getNextToken(istream& in, int& linenum); You MUST use the header file that is provided. You may NOT change it. I. You will write the lexical analyzer function, called getNextToken, in the file lex.cpp. The getNextToken function must have the following signature: LexItem getNextToken (istream& in, int& linenumber); The first argument to getNextToken is a reference to an istream object that the function should read from. The second argument to getNextToken is a reference to an integer that contains the current line number. getNextToken should update this integer every time it reads a newline from the input stream. getNextToken returns a LexItem object. A LexItem is a class that contains a token, a string for the lexeme, and the line number as data members. Note that the getNextToken function performs the following:

1. Any error detected by the lexical analyzer should result in a LexItem object to be returned with the ERR token, and the lexeme value equal to the string recognized when the error was detected. 2. Note also that both ERR and DONE are unrecoverable. Once the getNextToken function returns a LexItem object for either of these tokens, you shouldnt call getNextToken again. 3. Tokens may be separated by spaces, but in most cases are not required to be. For example, the input characters 3+7 and the input characters 3 + 7 will both result in the sequence of tokens ICONST PLUS ICONST. Similarly, The input characters Hello World, and the input characters HelloWorld will both result in the token sequence SCONST SCONST. II. You will implement the id_or_kw() function. Id_or_kw function accepts a reference to a string of a general identifier lexeme (i.e., keyword, IDENT, SIDENT, or NIDENT) and a line number and returns a LexItem object. It searches for the lexeme in a directory that maps a string value of a keyword to its corresponding Token value, and it returns a LexItem object containing the keyword Token if it is found. Otherwise, it returns a LexItem object containing a token for one of the possible types of identifiers (i.e., IDENT, SIDENT, or NIDENT). III. You will implement the overloaded function operator<<. The operator<< function accepts a reference to an ostream object and a reference to a LexItem object, and returns a reference to the ostream object. The operator<< function should print out the string value of the Token in the tok object. If the Token is either an IDENT, NIDENT, SIDENT, ICONST, RCONST, SCONST, it will print out its token followed by its lexeme between parentheses. See the example in the slides. Testing Program Requirements: It is recommended to implement the lexical analyzer in one source file, and the main test program in another source file. The testing program is a main() function that takes several command line flags. The notations for input flags are as follows: -v (optional): if present, every token is printed out when it is seen followed by its lexeme between parentheses. -nconst (optional): if present, prints out all the unique numeric constants (i.e., integer or real) in numeric order. -sconst (optional): if present, prints out all the unique string constants in alphabetical order

-ident (optional): if present, prints out all of the unique identifiers in alphabetical order. filename argument must be passed to main function. Your program should open the file and read from that filename. Note, your testing program should apply the following rules: 1. The flag arguments (arguments that begin with a dash) may appear in any order, and may appear multiple times. Only the last appearance of the same flag is considered. 2. There can be at most one file name specified on the command line. If more than one filename is provided, the program should print on a new line the message ONLY ONE FILE NAME IS ALLOWED. and it should stop running. If no file name is provided, the program should print on a new line the message NO SPECIFIED INPUT FILE.. Then the program should stop running. 3. If an unrecognized flag is present, the program should print on a new line the message UNRECOGNIZED FLAG {arg}, where {arg} is whatever flag was given. Then the program should stop running. 4. If the program cannot open a filename that is given, the program should print on a new line the message CANNOT OPEN THE FILE arg, where arg is the filename given. Then the program should stop running. 5. If getNextToken function returns ERR, the program should print Error in line N({lexeme}), where N is the line number of the token in the input file and lexeme is its corresponding lexeme, and then it should stop running. For example, a file that contains an invalid real constant, as .15, in line 1 of the file, the program should print the message: Error in line 1 (.15) 6. The program should repeatedly call getNextToken until it returns DONE or ERR. If it returns DONE, the program prints the list of all tokens if the -v is specified, followed by the summary information, then handles the flags -idents, -nconst, and -sconst in this order. The summary information are as follows: Lines: L Total Tokens: M Identifiers: N Numbers: O Strings: P Where L is the number of input lines, M is the number of tokens (not counting DONE), N is the number of identifiers tokens (e.g., IDENT, NIDENT, and SIDENT), O is the number o numeric constants, and P is the number of string literals. If the file is empty the value of L is zero, and the following output message is displayed. Lines: 0 Empty File. 7. If the -v option is present, the program should print each token as it is read and recognized, one token per line. The output format for the token is the token name in all capital letters (for example, the token LPAREN should be printed out as the string LPAREN. In the case of the tokens IDENT, and NCONST the token name should be followed by a space and the lexeme in parentheses. In case of the token SCONST, the token name should be followed by a space and the lexeme between single quotes (). For example, if an identifier $circle and a string literal The center of the circle through these points is are recognized, the -v output for them would be: NIDENT ($circle) SCONST The center of the circle through these points is) 8. The -sconst option should cause the program to print the label STRINGS: on a line by itself, followed by every unique string constant found, one string per line with single quotes, in alphabetical order. If there are no SCONSTs in the input, then nothing is printed. 9. The -nconsts option should cause the program to print the label NUMBERS: on a line by itself, followed by every unique numeric constant found, one number per line, in numeric order. If there are no NCONSTs in the input, then nothing is printed. 10. The -ident option should cause the program to print the label IDENTIFIERS: followed by a comma-separated list of every identifier found, in alphabetical order. If there are no identifiers in the input, then nothing is printed.

/* * lex.h * */

#ifndef LEX_H_ #define LEX_H_

#include #include #include using namespace std;

//Definition of all the possible token types enum Token { // keywords WRITELN, IF, ELSE,

// an identifiers IDENT, NIDENT, SIDENT,

// an integer, real, and string constant ICONST, RCONST, SCONST,

// the numeric operators, assignment, parens, braces, numeric and string comparison operators PLUS, MINUS, MULT, DIV, EXPONENT, ASSOP, LPAREN, RPAREN, LBRACES, RBRACES, NEQ, NGTHAN, NLTHAN, CAT, SREPEAT, SEQ, SLTHAN, SGTHAN, COMMA, SEMICOL, // any error returns this token ERR,

// when completed (EOF), return this token DONE, };

//Class definition of LexItem class LexItem { Token token; string lexeme; int lnum;

public: LexItem() { token = ERR; lnum = -1; } LexItem(Token token, string lexeme, int line) { this->token = token; this->lexeme = lexeme; this->lnum = line; }

bool operator==(const Token token) const { return this->token == token; } bool operator!=(const Token token) const { return this->token != token; }

Token GetToken() const { return token; } string GetLexeme() const { return lexeme; } int GetLinenum() const { return lnum; } };

extern ostream& operator<<(ostream& out, const LexItem& tok); extern LexItem id_or_kw(const string& lexeme, int linenum); extern LexItem getNextToken(istream& in, int& linenum);

#endif /* LEX_H_ */

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Logic In Databases International Workshop Lid 96 San Miniato Italy July 1 2 1996 Proceedings Lncs 1154

Authors: Dino Pedreschi ,Carlo Zaniolo

1st Edition

3540618147, 978-3540618140

More Books

Students also viewed these Databases questions