Question

1 Approved Answer

Posted on Sep 04, 2024

In this project, you need to implement a compiler for a language defined here. The programming language you need to use is C or C++

In this project, you need to implement a compiler for a language defined here. The programming language you need to use is C or C++ (and the language defined by the corresponding tools). The project includes two phases, lexical analysis, and syntax analysis. In the following, we first define the language syntax and tokens. The definitions are given in BNF form. 1 Language Definitions 1.1 Syntax Definitions program ::= { Program program-name function-definitions statements } program-name ::= identifier function-definitions ::= (function-definition)* function-definition ::= { Function function-name arguments statements return return-arg } function-name ::= identifier arguments ::= (argument)* argument ::= identifier return-arg ::= identifier | statements ::= (statement)+ statement ::= assignment-stmt | function-call | if-stmt | while-stmt assignment-stmt ::= { = identifier parameter } function-call ::= { function-name parameters } | { predefined-function parameters } predefined-function ::= + | | * | / | % | print parameters ::= (parameter)* parameter ::= function-call | identifier | number | character-string | Boolean number ::= integer | float if-stmt ::= { if expression then statements else statements } while-stmt ::= { while expression do statements } expression ::= {comparison-operator parameter parameter } | {Boolean-operator expression expression } | Boolean comparison-operator ::= == | > | < | >= | <= | != Boolean-operator ::= or | and To make the later references easier, lets call this language the FP language (similar to functional language syntax, but has the procedural language characteristics). 1.2 Definitions for the Remaining Tokens An identifier has the form [a-z, A-Z][a-z, A-Z, 0-9]*, but its length has to be between 1 to 6 characters. Confining the number of characters in an identifier is not a good language design feature, but this is for you to practice a different definition. You have to use regular expression to achieve the definition of the length constraint (you are not allowed to use the length feature in lex to confine the length). An integer can have a negative sign () but no positive sign. It can be with or without blank(s) between the sign and the digits. If the value is 0, only a single digit 0 is allowed. Otherwise, it is the same as common integer definitions (no leading 0s). A float has to have the decimal point and at least one digit on each side. The left side of the decimal point follows the rule for integers. The right side of the decimal point can be any number of digits but should have at least one digit. A character string is enclosed within (). It can be [a-z, A-Z, 0-9, \]+. For example, (I like lex and yacc) is a character string. We use \ to represent a new line, i.e., in C. The character strings are mainly used in the print function. We may also define functions that uses character strings as parameters. A Boolean can only be T or F, where T represents true and F represents false. 1.3 Definitions of the Predefined Functions Each of the predefined comparison operations and Boolean operators takes two parameters and the corresponding functionality follows the conventional definition. The predefined functions +, *, , /, and % also have the conventional meanings. +, *, , and / can take two or more parameters. For example, {* A B C} implies A * B * C. Similarly, and / can also take two or more parameters. For example, {/ A B C} implies A / B / C and { A B C} implies A B C. The % function can only take two parameters and {% A B} means A % B (A mod B). The predefine function print takes one or more parameters. It simply prints the value of each parameter. Whenever a character string (\) is encountered, a new line should be printed. Although there are functions that take a fixed number of parameters, it is not your job to check the number of parameters (not in the lexical and syntax analysis phases). Also, it is not necessary to check whether the number of arguments and number of parameters do match (not in the lexical and syntax analysis phases). You only need to follow the general rules defined in the syntax of the language. 1.4 White Space To ensure the correctness in processing the input FP-program, you need to consider white space as well. White space characters include space, tab, and new-line. Treat the white space the same way as in other high level languages such as C++ and Java (skip white space). Note that space in character string should not be skipped. 1.5 Some Remarks Note that we do not consider the language typing rules. Thus, there is no need to predefine variables and your program does not need to handle mismatched types. The FP language is case sensitive. Here is an example program using the FP language. {Program Sample {Function facto VAL {if {< VAL 0 } then {= retVal -1} else {= retVal 1} {while {> VAL 0} do {= retVal {* retVal VAL}} {= VAL {- VAL 1}} } } return retVal } {print {facto 999}} } Project Phase 1 -- Lexical Analysis Using lex Define the tokens in the FP language in lex definitions and feed it to lex to generate a scanner (lexer). Then, use the sample program as well as other testing programs written in FP to test your scanner. Note that it is your responsibility to convert the BNF (or verbal) definitions to lex definitions. The tokens in FP include: The set of keywords: Program, Function, return, if, then, else, while, do, or, and, print. The set of special symbols: {, }, (, ), =, +, , *, /, %, ==, >, <, >=, <=, !=. The set of other tokens: identifier, integer, float, character-string, Boolean. It is also your responsibility to insert appropriate statements (actions) in your lex definition file so that the lexer (created by lex) will generate a symbol table and print the output token names appropriately. You should design the symbol table such that it contains the necessary fields for this as well as the future projects. The token names should be clearly printed, one on each line, in the correct order. The names of the tokens should conform to those given above. The original symbols for the predefined functions and the comparison-operators should be used as the token names (such as >=). The keywords itself should be used as token names (such as while). For the other tokens, the specific names: identifier, integer, float, character-string, and Boolean should be used. Note that for the other tokens, you should output the token names as well as the texts you obtained (in the same line). If you do not keep track of the original texts, such as the identifiers name, the integers value, the actual character string, etc., then, the information will be lost. The symbol table should be printed as well. You need to print out the content of the symbol table, one entry on a line. (In this phase, you only have the identifier names). You need to electronically submit your program before midnight of the due date. Follow the instruction below for submission. Please include the following files in your submission:

The lex definition file for FP language. The file name should be FP.l.

A file containing a sample program written in FP language that you have tested with the scanner generated from your lex definition file. The file name should be sample.fp. If you have multiple sample files, you can name them sample1.fp, sample2.fp, etc.

A readme file that explains how to interpret your output from lex (the list of tokens). Also, if you have some information that you would like the grader know, you can also put it in this file.

The optional Design.doc file that contains the description of some special features of your project that is not specified in the project specification, including all problems your program may have and/or all additional features you implemented. You are responsible for generating your own testing programs and test your project thoroughly. We may use a different set of FP programs for testing. If your program does not fully function, please try to have partial results printed clearly to obtain partial credits. If your program does not work or it does not provide any output, then there will be no partial credits given. If your program does work fully or partially but we cannot make it run properly, then it is your responsibility to make an appointment with the grader and come to demonstrate your program. According to the problems, appropriate point deductions will apply in these situations.