Question: the lexical analyzer, you will be provided with a description of the lexical syntax of the language. You will produce a lexical analysis function and

the lexical analyzer, you will be provided with a description of the lexical syntax of the language. You will produce a lexical analysis function and a program to test it.

The lexical analyzer function must have the following calling signature:

Token getNextToken(istream& in, int& linenumber);

The first argument to getNextToken is a reference to an istream that the function should read from. The second argument to getNextToken is a reference to an integer that contains the current line number. getNextToken will update this integer every time it reads a newline. getNextToken returns a Token. A Token is a class that contains a TokenType, a string for the lexeme, and the line number that the token was found on.

A header file, tokens.h, will be provided for you. It contains a declaration for the Token class, and a declaration for all of the TokenType values. You MUST use the header file that is provided. You may NOT change it.

The lexical rules of the language are as follows:

1. The language has identifiers, which are defined to be a letter followed by zero or more letters or numbers. This will be the TokenType ID.

2. The language has integer constants, which are defined to be one or more digits. This will be the TokenType ICONST.

3. The language has string constants, which are a double-quoted sequence of characters, all on the same line. This will be the TokenType SCONST.

4. A string constant can include escape sequences: a backslash followed by a character. The sequence should be interpreted as a newline. The sequence \\ should be interpreted as a backslash. All other escapes should simply be interpreted as the character after the backslash.

5. The language has reserved the keywords print, set, if, loop, begin, end. They will be TokenTypes PRINT SET IF LOOP BEGIN END.

6. The language has several operators. They are + - * / ( ) which will be TokenTypes PLUS MINUS STAR SLASH LPAREN RPAREN

7. The language recognizes a semicolon as the token SC

8. The language recognizes a newline as the token NL

9. A comment is all characters from a # to the end of the line; it is ignored and is not returned as a token. NOTE that a # in the middle of an SCONST is NOT a comment!

10. Whitespace between tokens can be used for readability. It serves to delimit tokens.

11. An error will be denoted by the ERR token.

12. End of file will be denoted by the DONE token.

Note that any error detected by the lexical analyzer should result in the ERR token, with the lexeme value equal to the string recognized when the error was detected.

Note also that both ERR and DONE are unrecoverable. Once the getNextToken function returns a Token for either of these token types, you shouldnt call getNextToken again.

The assignment is to write the lexical analyzer function and some test code around it.

It is a good idea to implement the lexical analyzer in one source file, and the main test program in another source file.

The test code is a main() program that takes several command line arguments:

-v (optional) if present, every token is printed when it is seen

-strings (optional) if present, print out all the string constants in alphabetical order

-ids (optional) if present, print out all of the identifiers in alphabetical order

filename (optional) if present, read from the filename; otherwise read from standard in

Note that no other flags (arguments that begin with a dash) are permitted. If an unrecognized flag is present, the program should print UNRECOGNIZED FLAG {arg}, where {arg} is whatever flag was given, and it should stop running.

At most one filename can be provided, and it must be the last command line argument. If more than one filename is provided, the program should print ONLY ONE FILE NAME ALLOWED and it should stop running.

If the program cannot open a filename that is given, the program should print CANNOT OPEN {arg}, where {arg} is the filename given, and it should stop running.

The program should repeatedly call getNextToken until it returns DONE or ERR. If it returns DONE, the program proceeds to handling the -strings and -ids options, in that order. It should then print summary information and exit.

If getNextToken returns ERR, the program should print Error on line N ({lexeme}), where N is the line number for the token and lexeme is the lexeme from the token, and it should stop running.

If the -v option is present, the program should print each token as it is read and recognized, one token per line. The output format for the token is the token name in all capital letters (for example, the token LPAREN should be printed out as the string LPAREN. In the case of token ID, ICONST, and SCONST, the token name should be followed by a space and the lexeme in parens. For example, if the identifier hello is recognized, the -v output for it would be ID (hello).

The -strings option should cause the program to print STRINGS: on a line by itself, followed by every string constant found, one string per line, in alphabetical order. If there are no SCONSTs in the input, then nothing is printed.

The -ids option should cause the program to print IDENTIFIERS: followed by a comma-separated list of every identifier found, in alphabetical order. If there are no IDs in the input, then nothing is printed.

The summary information is as follows:

Total lines: L

Total tokens: N

Where L is the number of input lines and N is the number of tokens (not counting DONE). If L is zero, no further lines are printed.

The program should do the following:

Compiles

Argument error cases

Files that cannot be opened

Too many filenames

Properly handles a zero length file

Recognizes keywords and identifiers

Summary information

-v mode

tokens.h

#ifndef TOKENS_H_ #define TOKENS_H_

#include #include using std::string; using std::istream; using std::ostream;

enum TokenType { // keywords PRINT, SET, IF, LOOP, BEGIN, END,

// an identifier ID,

// an integer and string constant ICONST, SCONST,

// the operators, parens, semicolon, newline PLUS, // a + MINUS, // a - STAR, // a * SLASH, // a / LPAREN, // a ( RPAREN, // a ) SC, // a semicolon NL, // a newline

// any error returns this token ERR,

// when completed (EOF), return this token DONE };

class Token { TokenType tt; string lexeme; int lnum;

public: Token() { tt = ERR; lnum = -1; } Token(TokenType tt, string lexeme, int line) { this->tt = tt; this->lexeme = lexeme; this->lnum = line; }

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Lexical Analyzers - C++ Contents of tokens.h /* * tokens.h * * CS280 * Spring 2019 */ #ifndef TOKENS_H_ #define TOKENS_H_ #include #include using std::string; using std::istream; using std::ostream;...

Source Code: (Please if instructions can be provided because I am struggling with this problem. It is in C++ which i do not have much experience on. I only need help in part 1. Thank You) For the...

Lexical Analyzers - Must be written in C++ For Program 2, the lexical analyzer, you will be provided with a description of the lexical syntax of the language. You will produce a lexical analysis...

For Program 2, the lexical analyzer, you will be provided with a description of the lexical syntax of the language. You will produce a lexical analysis function and a program to test it. The lexical...

For the lexical analyzer, you will be provided with a description of the lexical syntax of the language. You will produce a lexical analysis function and a program to test it. The lexical analyzer...

Question 2 Your task is to design and implement a lexical anlayzer for a programming language whose specifications are given below. The scanner identifies and outputs tokens ( valid words and...

New to C++, if you can please provide instructions. (Only if you prefer or have time). Any help is appreciated. Starter Code For the remainder of the semester we will be building a small program that...

The following assignment requires programming in c++ For the remainder of the semester we will be building a small program that interprets a small language. The language will have integers and...

The WTO cannot punish individual companies, but can only direct the actions toward governments of countries. Why do you think the WTO was not given authority to charge individual companies with...

Following Balances are taken from the records of a Government General Fund as of January 1, 2022, (numbers in thousands of $). Account Cash Accounts Payable Unrestricted Fund Balance Dr. $20000 Cr....

An auditor asking the employee who prepares the bank reconciliation how reconciling items are identified would be an example of _ _ _ _ _ _ _ _ . Group of answer choices statistical testing inquiry...

Renue Spa had the following balances at December 31, Year 2: Cash of $12,000, Accounts Receivable of $89,000, Allowance for Doubtful Accounts of $2,300, and Retained Earnings of $98,700. During Year...

5. Go to www.isense.com, the Web site for InterSense, a company that develops and markets motion tracking projects used for commercial applications. Click on either Military or Industrial markets....

11. Distance learning can be used to deliver a lecture to geographically dispersed trainees. How might distance learning be designed and used to avoid some of the learning and transfer of training...

12. Why would a company use a combination of face-to-face instruction and Web-based training?