Question

1 Approved Answer

Posted on Aug 02, 2024

Lexer.java /** * The Lexer class is responsible for scanning the source file which is a stream * of characters and returning a stream of

image text in transcribed

Lexer.java

/** * The Lexer class is responsible for scanning the source file which is a stream * of characters and returning a stream of tokens; each token object will * contain the string (or access to the string) that describes the token along * with an indication of its location in the source program to be used for error * reporting; we are tracking line numbers; white spaces are space, tab, * newlines */ public class Lexer {

private boolean atEOF = false; // next character to process private char ch; private SourceReader source;

// positions in line of current token private int startPosition, endPosition; public Lexer(String sourceFile) throws Exception { // init token table new TokenType(); source = new SourceReader(sourceFile); ch = source.read(); }

public Token newIdToken(String id, int startPosition, int endPosition) { return new Token( startPosition, endPosition, Symbol.symbol(id, Tokens.Identifier) ); }

public Token newNumberToken(String number, int startPosition, int endPosition) { return new Token( startPosition, endPosition, Symbol.symbol(number, Tokens.INTeger) ); }

public Token makeToken(String s, int startPosition, int endPosition) { // filter comments if (s.equals("//")) { try { int oldLine = source.getLineno();

do { ch = source.read(); } while (oldLine == source.getLineno()); } catch (Exception e) { atEOF = true; }

return nextToken(); }

// ensure it's a valid token Symbol sym = Symbol.symbol(s, Tokens.BogusToken);

if (sym == null) { System.out.println("******** illegal character: " + s); atEOF = true; return nextToken(); }

return new Token(startPosition, endPosition, sym); }

/** * @return the next Token found in the source file */ public Token nextToken() { // ch is always the next char to process if (atEOF) { if (source != null) { source.close(); source = null; }

return null; }

try { // scan past whitespace while (Character.isWhitespace(ch)) { ch = source.read(); } } catch (Exception e) { atEOF = true; return nextToken(); }

startPosition = source.getPosition(); endPosition = startPosition - 1;

if (Character.isJavaIdentifierStart(ch)) { // return tokens for ids and reserved words String id = "";

try { do { endPosition++; id += ch; ch = source.read(); } while (Character.isJavaIdentifierPart(ch)); } catch (Exception e) { atEOF = true; }

return newIdToken(id, startPosition, endPosition); }

if (Character.isDigit(ch)) { // return number tokens String number = "";

try { do { endPosition++; number += ch; ch = source.read(); } while (Character.isDigit(ch)); } catch (Exception e) { atEOF = true; }

return newNumberToken(number, startPosition, endPosition); }

// At this point the only tokens to check for are one or two // characters; we must also check for comments that begin with // 2 slashes String charOld = "" + ch; String op = charOld; Symbol sym; try { endPosition++; ch = source.read(); op += ch;

// check if valid 2 char operator; if it's not in the symbol // table then don't insert it since we really have a one char // token sym = Symbol.symbol(op, Tokens.BogusToken); if (sym == null) { // it must be a one char token return makeToken(charOld, startPosition, endPosition); }

endPosition++; ch = source.read();

return makeToken(op, startPosition, endPosition); } catch (Exception e) { /* no-op */ }

atEOF = true; if (startPosition == endPosition) { op = charOld; }

return makeToken(op, startPosition, endPosition); }

/* public static void main(String args[]) { Token token; try { Lexer lex = new Lexer( "simple.x" ); while( true ) { token = lex.nextToken(); String p = "L: " + token.getLeftPosition() + " R: " + token.getRightPosition() + " " + TokenType.tokens.get(token.getKind()) + " "; if ((token.getKind() == Tokens.Identifier) || (token.getKind() == Tokens.INTeger)) { p += token.toString(); } System.out.println( p + ": " + lex.source.getLineno() ); } } catch (Exception e) {} } */ }

You will be extending the Lexer in order to be able to process three new tokens, as well as to improve the output of 1. The current implementation of Lexer reads a hardcoded file. Lexer must be updated to allow input via a filename provided as a command line argument: java lexer.Lexer sample_files/simple.x Note that the main method is currently commented out - you should uncomment and update this method. In the event that no filename is supplied, a usage instruction should be displayed java lexer.Lexer 5 usage java lexer.Lexer filename. x Our compiler must be updated to accommodate three additional tokens. The tokens file must be updated, and TokenSetup run in order to re-generate the Tokens and TokenTypes classes 2. 2. GreaterEqual:- 3. Void: void (this is a type) 5. CHARacter: any character literal, which is any single character surrounded by a single quote 6. Scientific: scientific (this is the type) 7. ScientificLit: a number expressed in normalized (one digit before the decimal point) scientific notation as expressed by d.dd?[Eell+-ld+ The Token class must be updated to include the line number that a token was found (for subsequent error 3. 4. Lexer output must be updated for readability, and to include the line number from the Token, as well as the type of the token created. (Note that the initial debug text that shows the file information has been removed. The format for each of the token lines is 1. 11 columns, left aligned, for the token description, then a space 2. left:, then a space 3. 8 columns, left aligned, for the left position, then a space 4. right:, then a space 5. 8 columns, left aligned, for the right position, then a space 6. linei, then a space 7. 8 columns, left aligned, for the line number, then a space java lexer.Lexer sample_files/simple.x READLINE: program ( int i int j program left: 0 right: 18 right: 20 /*Remainder of output omitted for brevity, see Appendix A 5. Lexer output must be updated to include a printout, with line number, of each of the lines read in from the source file. Note that when an error is encountered, the error should be reported as usual, and the lines of the source file should be output, with line numbers, up to and including the error line 1:program int i int j 5 1 GLexer /Source Packages Lexer.java SourceReader.java Symbol.java Token.java TokenType.java Tokens.java E313 lexe r. setup TokenSetup.java tokens You will be extending the Lexer in order to be able to process three new tokens, as well as to improve the output of 1. The current implementation of Lexer reads a hardcoded file. Lexer must be updated to allow input via a filename provided as a command line argument: java lexer.Lexer sample_files/simple.x Note that the main method is currently commented out - you should uncomment and update this method. In the event that no filename is supplied, a usage instruction should be displayed java lexer.Lexer 5 usage java lexer.Lexer filename. x Our compiler must be updated to accommodate three additional tokens. The tokens file must be updated, and TokenSetup run in order to re-generate the Tokens and TokenTypes classes 2. 2. GreaterEqual:- 3. Void: void (this is a type) 5. CHARacter: any character literal, which is any single character surrounded by a single quote 6. Scientific: scientific (this is the type) 7. ScientificLit: a number expressed in normalized (one digit before the decimal point) scientific notation as expressed by d.dd?[Eell+-ld+ The Token class must be updated to include the line number that a token was found (for subsequent error 3. 4. Lexer output must be updated for readability, and to include the line number from the Token, as well as the type of the token created. (Note that the initial debug text that shows the file information has been removed. The format for each of the token lines is 1. 11 columns, left aligned, for the token description, then a space 2. left:, then a space 3. 8 columns, left aligned, for the left position, then a space 4. right:, then a space 5. 8 columns, left aligned, for the right position, then a space 6. linei, then a space 7. 8 columns, left aligned, for the line number, then a space java lexer.Lexer sample_files/simple.x READLINE: program ( int i int j program left: 0 right: 18 right: 20 /*Remainder of output omitted for brevity, see Appendix A 5. Lexer output must be updated to include a printout, with line number, of each of the lines read in from the source file. Note that when an error is encountered, the error should be reported as usual, and the lines of the source file should be output, with line numbers, up to and including the error line 1:program int i int j 5 1 GLexer /Source Packages Lexer.java SourceReader.java Symbol.java Token.java TokenType.java Tokens.java E313 lexe r. setup TokenSetup.java tokens