Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 12, 2024

The following assignment requires programming in c++ For the remainder of the semester we will be building a small program that interprets a small language.

The following assignment requires programming in c++

For the remainder of the semester we will be building a small program that interprets a small

language. The language will have integers and strings, a small number of keywords, and some

operators. !

The remainder of the semester will be broken into three pieces:

Program 2 - Lexical analyzer

Program 3 - Parser

Program 4 - Interpreter !

For the lexical analyzer, you will be provided with a description of the lexical syntax of the

language. You will produce a lexical analysis function and a program to test it. !

The lexical analyzer function will have the following calling signature: !

Token getNextToken(istream *in, int *linenumber);

The first argument to getNextToken is a pointer to an istream that the function should read from.

The second argument to getNextToken is a pointer to an integer that contains the current line

number. getNextToken will update this integer every time it reads a new line. getNextToken

returns a Token. A Token is a class that contains a TType, a string for the lexeme, and the line

number that the token was found on. !

A header file, projlex.h, will be provided for you. You MUST use the provided header file. You

may NOT change it. !

The lexical rules of the language are as follows:

i. The language has identifiers, which are defined to be a letter followed by zero or more

letters or numbers. This will be the token IDENT.

ii. The language has integer constants, which are defined to be an optional leading dash

(for a negative number), followed by one or more digits. This will be the token ICONST.

iii. The language has string constants, which are a double-quoted sequence of characters,

all on the same line. This will be the token SCONST.

iv. The language has reserved the keywords var, print, set, and repeat. They will be the

tokens VAR, PRINT, SET, and REPEAT.

v. The language has several single-character tokens. They are + - * : [ ] ( ) ; which will be

the tokens PLUS MINUS STAR COLON LSQ RSQ LPAREN RPAREN SC.

vi. A comment is all characters from a # to the end of the line; it is ignored and is not

returned as a token. NOTE that a # in the middle of an SCONST is NOT a comment!

vii. Whitespace between tokens can be used for readability. It serves to delimit tokens.

viii. The newline should be recognized so that lines can be counted.

ix. An error will be denoted by the ERR token.

x. End of file will be denoted by the DONE token. !

Note that any error detected by the lexical analyzer will result in the ERR token, with the lexeme

value equal to the string recognized when the error was detected. !

CS280 Programming Assignment 2

Spring 2018

Note also that both ERR and DONE are unrecoverable. Once the getNextToken function returns

a Token for either of these token types, you shouldnt call getNextToken again. !

The assignment is to write the lexical analyzer function and some test code around it. !

It is a good idea to implement the lexical analyzer in one source file, and the main test program

in another source file. !

The test code is a main() program that takes several command line arguments:

-v (optional) if present, every token is printed when it is seen

-mci (optional) if present, the identifier that appears the most often is printed

-sum (optional) if present, summary information is printed

filename (optional) if present, read from the filename; otherwise read from standard in !

Note that no other flags (arguments that begin with a dash) are permitted. If an unrecognized

flag is present, the program should print INVALID FLAG {arg}, where {arg} is whatever flag

was given, and it should stop running. !

At most one filename can be provided, and it must be the last command line argument. If more

than one filename is provided, the program should print TOO MANY FILE NAMES and it

should stop running. !

If the program cannot open a filename that is given, the program should print UNABLE TO

OPEN {arg}, where {arg} is the filename given, and it should stop running. !

The program should repeatedly call the lexical analyzer function until it returns DONE or ERR. If

it returns DONE, the program proceeds to handling the -mci and -sum options, if any, and then

exits. If it returns ERR, the program should print Error on line N ({lexeme}), where N is the line

number in the token and lexeme is the lexeme from the token, and it should stop running. !

If the -v option is present, the program should print each token as it is read and recognized, one

token per line. The output format for the token is the token name in all capital letters (for

example, the token LPAREN should be printed out as the string LPAREN. In the case of token

IDENT, ICONST, and SCONST, the token name should be followed by a space and the lexeme

in parens. For example, if the identifier hello is recognized, the -v output for it would be ID

(hello) !!! If the -mci option is present, the program should, after seeing the DONE token, print out the

following report: !

Most Common Identifier: X !

Where X is the IDENT lexeme that appears most often in the input. If several different lexemes

appear the same number of times, then X is a comma separated list of the lexemes, in

alphabetical order. If there are no IDENT tokens, then this line is not printed.

! If the -

sum option is present the program should, after seeing the DONE token and processing

the -mci option, print out the following report: !

Total lines: L

Total tokens: N

Total strings: X

Length of longest string: Y !

Where L is the number of input lines, N is the number of tokens (not counting DONE), X is a

count of the number of SCONST tokens, and Y is the length of the longest SCONST token. !

If N is zero, no further lines are printed. If X is 0, the last line is not printed. !!

DUE DATES !

PART 1: Due Sun March 4

Compiles

Recognizes invalid command line flags, a file that cannot be opened, and case with

more than one file name

Recognizes string with a newline in it as an error

Recognizes string with a # in it as a string, not a comment

Recognizes all valid token types

Recognizes various erroneous tokens

Supports -v mode !

//lexer header file

/* * projlex.h * */ #ifndef PROJLEX_H_ #define PROJLEX_H_ #include #include using std::string; using std::istream; using std::ostream; enum TType { // keywords SET, PRINT, VAR, REPEAT, // an identifier IDENT, // an integer and string constant ICONST, SCONST, // the operators, parens and semicolon PLUS, MINUS, STAR, COLON, LSQ, RSQ, LPAREN, RPAREN, SC, // any error returns this token ERR, // when completed (EOF), return this token DONE }; class Token { TType tt; string lexeme; int lnum; public: Token() { tt = ERR; lnum = -1; } Token(TType tt, string lexeme, int line) { this->tt = tt; this->lexeme = lexeme; this->lnum = line; } bool operator==(const TType tt) const { return this->tt == tt; } bool operator!=(const TType tt) const { return this->tt != tt; } TType GetTokenType() const { return tt; } string GetLexeme() const { return lexeme; } int GetLinenum() const { return lnum; } }; extern ostream& operator<<(ostream& out, const Token& tok); extern Token getNextToken(istream *in, int *linenum); #endif /* PROJLEX_H_ */