Question

1 Approved Answer

Posted on Sep 22, 2024

Building a Lexical Analyser with Ruby The grammar rules for the language TINY are listed below. In this assignment we will identify the tokens in

image text in transcribed

Building a Lexical Analyser with Ruby The grammar rules for the language TINY are listed below. In this assignment we will identify the tokens in this language and build a lexical analyser (lexer) for recognizing and outputting TINY tokens. https://www.cs.rochester.edu/-brow/173/readings/05_grasmars.txt # "TINY" Grammar --> --> # PEM # STMT # ASSIGN # EXP # ETAIL # TERM # ITAIL # FACTOR --> --> STMT+ 23SIGN "print" EXP ID =" EXP TERM ETAIL "+" TERM ETAIL | "-TERM ETAIL | EPSILON FACTOR TTAIL "*" FACTOR TTAIL | "/" FACTOR TTAIL | EPSILON "(" EXP ")" INT | ID ALPHA+ ab 1-1. or A B 1 - 2 DIGIT+ 01 9 Ruby Whitespace # ID --> # ALPHA # # INT # DIGIT --> # WHITESPACE --> Whenever a token is identified, we encapsulate it using the Token class below. Each token has a type and text. For example, if DOG was identified as a variable in this language, it could have type 'id' and text "DOG". The add operator might have type 'addOp and text or type and text +. These values are somewhat arbitrary - choose values that make sense to you. Tokens in your grammar yet. In addition to identifying and returning a token, this method should print what it finds, each time it identifies a token (just like the example of the lexer in the PowerPoint for chapter 4). You must complete this method. 5) Contiguous whitespace should be combined and emitted as a single token. I already did this for you. 8) An end of file (EOF) token should be emitted when the file has been completely processed. I already did this for you. 7) There are several helper methods defined at the bottom of 'Tiny Scanner.rb" that you can use to help you to identify different types of characters (like numbers, letters, and whitespace) Display and Print Tokens I've included a file called "TestTinylexer.rb that you can use to test your lexer (previous section). For the time being, it simply scans a file and stores all of the tokens in a text file called "tokens". Feel free to modify this file to test different things about your lexer, or to open the tokens file to see what was actually saved. I've also included a sample input file (input.but) that adheres to our TINY programming language. You should experiment with different inputs that both adhere to and don't adhere to the grammar to verify the correctness of your lexer. Sample Program Output Below is a screenshot of sample output that would result from using the included sample input file. rsardinas>88-Laptop: $ ruby ScannerTesti.rb Next token is: id Next lexeme is: x Next token is: whitespace Next lexeme is: Next token is: Next lexeme is: - Next token is: whitespace Next lexeme is: Next token is: int Next lexeme is: 3 Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: y Next token is: whitespace Next lexeme is: Next token is: = Next lexeme is: = Next token is: whitespace Next lexeme is: Next token is: int Next lexeme is: 5 Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: z Next token is: whitespace Next lexeme is: Next token is: = Next lexeme is: Next token is: whitespace Next lexeme is: Next token is: ( Next lexeme is: ( Next token is: id Next lexeme is: x Next token is: whitespace Next lexeme is: Next token is: + Next lexeme is: + Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: y Next token is: ) Next lexeme is:) Next token is: whitespace Next lexeme is: Next token is: / Next lexeme is: / Next token is: whitespace Next lexeme is: Next token is: int Next lexeme is: 3 Next token is: whitespace Next lexeme is: Next token is: print Next lexeme is: print Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: z Token Class I have given you a partially complete Token class that you should modify for this assignment called TinyToken.rb. It has a few constants already defined. The VALUES of these variables represent the name of the token. You will need to build more constants to store all of the tokens that exist in this language. For this assignment, you get to decide what categories of lexemes there are and what you want to name those categories It is important to test all the code you write. At the very least use "puts" to test the class methods. You can build the tests in separate files and load them as needed load "nytest.rb'. The code below tests the Token class methods. # Test the Toten class tok = Token.new("atype","atext") puts "Token type: #{tok.type}" puts "Token text: #{tok.text)" tok.type = "btype" tok.text = "btext put: "Token type: #{tot.type}" puts "Token text: #{tok.text)" puts "Token: #{tok) The "Lexer" The first goal is to build a Scanner (or Lexer) for TINY. I have sketched the basic structure in file named "Tiny Scanner.rb". Here are a few points to consider. 1) The constructor is passed a file name which contains the source code for a TINY program. The constructor opens the file and reads the first character, storing it in class variable @c (which acts as a one-character look ahead). I already did this. 2) Currently, the Scanner abends if it is passed a file that doesn't exist. Modify the code so that it fails gracefully in this circumstance, 3) Method nextCh) updates @c with the next character and returns it unless it has reached the end of the file, in which case it will return "eof. I already did this. 4) Method nextToken() returns the next token identified by the scanner. It is not complete, as it does not identify all Building a Lexical Analyser with Ruby The grammar rules for the language TINY are listed below. In this assignment we will identify the tokens in this language and build a lexical analyser (lexer) for recognizing and outputting TINY tokens. https://www.cs.rochester.edu/-brow/173/readings/05_grasmars.txt # "TINY" Grammar --> --> # PEM # STMT # ASSIGN # EXP # ETAIL # TERM # ITAIL # FACTOR --> --> STMT+ 23SIGN "print" EXP ID =" EXP TERM ETAIL "+" TERM ETAIL | "-TERM ETAIL | EPSILON FACTOR TTAIL "*" FACTOR TTAIL | "/" FACTOR TTAIL | EPSILON "(" EXP ")" INT | ID ALPHA+ ab 1-1. or A B 1 - 2 DIGIT+ 01 9 Ruby Whitespace # ID --> # ALPHA # # INT # DIGIT --> # WHITESPACE --> Whenever a token is identified, we encapsulate it using the Token class below. Each token has a type and text. For example, if DOG was identified as a variable in this language, it could have type 'id' and text "DOG". The add operator might have type 'addOp and text or type and text +. These values are somewhat arbitrary - choose values that make sense to you. Tokens in your grammar yet. In addition to identifying and returning a token, this method should print what it finds, each time it identifies a token (just like the example of the lexer in the PowerPoint for chapter 4). You must complete this method. 5) Contiguous whitespace should be combined and emitted as a single token. I already did this for you. 8) An end of file (EOF) token should be emitted when the file has been completely processed. I already did this for you. 7) There are several helper methods defined at the bottom of 'Tiny Scanner.rb" that you can use to help you to identify different types of characters (like numbers, letters, and whitespace) Display and Print Tokens I've included a file called "TestTinylexer.rb that you can use to test your lexer (previous section). For the time being, it simply scans a file and stores all of the tokens in a text file called "tokens". Feel free to modify this file to test different things about your lexer, or to open the tokens file to see what was actually saved. I've also included a sample input file (input.but) that adheres to our TINY programming language. You should experiment with different inputs that both adhere to and don't adhere to the grammar to verify the correctness of your lexer. Sample Program Output Below is a screenshot of sample output that would result from using the included sample input file. rsardinas>88-Laptop: $ ruby ScannerTesti.rb Next token is: id Next lexeme is: x Next token is: whitespace Next lexeme is: Next token is: Next lexeme is: - Next token is: whitespace Next lexeme is: Next token is: int Next lexeme is: 3 Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: y Next token is: whitespace Next lexeme is: Next token is: = Next lexeme is: = Next token is: whitespace Next lexeme is: Next token is: int Next lexeme is: 5 Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: z Next token is: whitespace Next lexeme is: Next token is: = Next lexeme is: Next token is: whitespace Next lexeme is: Next token is: ( Next lexeme is: ( Next token is: id Next lexeme is: x Next token is: whitespace Next lexeme is: Next token is: + Next lexeme is: + Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: y Next token is: ) Next lexeme is:) Next token is: whitespace Next lexeme is: Next token is: / Next lexeme is: / Next token is: whitespace Next lexeme is: Next token is: int Next lexeme is: 3 Next token is: whitespace Next lexeme is: Next token is: print Next lexeme is: print Next token is: whitespace Next lexeme is: Next token is: id Next lexeme is: z Token Class I have given you a partially complete Token class that you should modify for this assignment called TinyToken.rb. It has a few constants already defined. The VALUES of these variables represent the name of the token. You will need to build more constants to store all of the tokens that exist in this language. For this assignment, you get to decide what categories of lexemes there are and what you want to name those categories It is important to test all the code you write. At the very least use "puts" to test the class methods. You can build the tests in separate files and load them as needed load "nytest.rb'. The code below tests the Token class methods. # Test the Toten class tok = Token.new("atype","atext") puts "Token type: #{tok.type}" puts "Token text: #{tok.text)" tok.type = "btype" tok.text = "btext put: "Token type: #{tot.type}" puts "Token text: #{tok.text)" puts "Token: #{tok) The "Lexer" The first goal is to build a Scanner (or Lexer) for TINY. I have sketched the basic structure in file named "Tiny Scanner.rb". Here are a few points to consider. 1) The constructor is passed a file name which contains the source code for a TINY program. The constructor opens the file and reads the first character, storing it in class variable @c (which acts as a one-character look ahead). I already did this. 2) Currently, the Scanner abends if it is passed a file that doesn't exist. Modify the code so that it fails gracefully in this circumstance, 3) Method nextCh) updates @c with the next character and returns it unless it has reached the end of the file, in which case it will return "eof. I already did this. 4) Method nextToken() returns the next token identified by the scanner. It is not complete, as it does not identify all