Question

1 Approved Answer

Posted on Sep 10, 2024

Implement a scanner for the language over this grammar using the nested cases approach. Start with determining what should be the tokensin this grammar. Then,

Implement a scanner for the language over this grammar using the nested cases approach. Start with determining what should be the tokensin this grammar. Then, using a pencil and paper, draw an FSM for the scanner. Verify that the FSM indeed accepts the tokens of the language over this grammar. When you convince yourself that the FSM is correct, start coding. Do not use any library-based scanning facilities like strtok() or strsep()in your implementation. Test your scanner on the following program: firstvar := 1; secondvar := 2; repeat (10) thirdvar := 2 * (firstvar + secondvar) / (firstvar + 2); repeat (firstvar + 2 * secondvar) repeat (thirdvar) print firstvar; Print the tokens as they are recognized by the scanner as follows: { value} For example: { firstvar}, {}, { 1}, {}, { secondvar}, {}, { 2}, {}, { repeat}, {}, ... Your scanner should be implemented in a separate file (scanner.c) and called for each token from the driver (that you should implement in main.c) that should take input either from a file or from stdin if no file is specified as a command line argument (recall argv and argc prameters of main()). The scanner should return a pointer to a structure TOKEN that includes token type and a union for any extra information. For example, numbers need their values, identifiers need their names, as do keywords, etc. Token types are already declared through enum in scanner.h. The main() should print the token using the return value; one token at a time. You may use C ... extensions for switch statement that allow to specify ranges of values for cases; for example: switch (character) { case 'a'...'z': ... break; case '0'...'9': ... break; deafult: ... } You can also use functions like isdigit(), etc. You will also find it useful to get familar with ungetc(), because it will allow you to put a character back into the stream, so that when you start the scanner next time it starts with the same character. That approach will lead to a simpler code, as you do not have to store any character for future tokens. You may also want to explore freopen(), so your input can come from either a file or stdin. #ifndef __SCANNER_H #define __SCANNER_H

#include #include #include #include

typedef enum { INVALID_TOKEN = 0, NUMBER_TOKEN, IDENT_TOKEN, ASSIGNMENT_TOKEN, SEMICOLON_TOKEN, LPAREN_TOKEN, RPAREN_TOKEN, PLUS_TOKEN, MINUS_TOKEN, MULT_TOKEN, DIV_TOKEN, MOD_TOKEN, REPEAT_TOKEN, PRINT_TOKEN, END_OF_INPUT_TOKEN } TOKEN_TYPE;

typedef struct token { TOKEN_TYPE type; char *strVal; } TOKEN;

TOKEN *scannerAdHoc();

void ungetToken(TOKEN **);

void freeToken(TOKEN **);

#define BUF_SIZE 128 #define MAX_LINE_LENGTH 256

#endif

#include "scanner.h"

int main(int argc, char **argv) { freopen(argv[1], "r", stdin);

TOKEN *token = NULL; char *token2str[] = {"INVALID", "NUMBER", "IDENT", "ASSIGNMENT", "SEMICOLON", "LPAREN", "RPAREN", "PLUS", "MINUS", "MULT", "DIV", "MOD", "REPEAT", "PRINT", "END_OF_INPUT"}; printf(" "); while ((token = scannerAdHoc()) != NULL) { if ( token->strVal == NULL) printf("{%s} ", token2str[token->type]); else printf("{%s, %s} ", token2str[token->type], token->strVal); freeToken(&token); fflush(stdout); } printf(" "); exit(EXIT_SUCCESS); }