Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Phase 1. Lexical analysis In this project, you need to implement a compiler for a language defined here. The programming language you need to use

Phase 1. Lexical analysis

In this project, you need to implement a compiler for a language defined here. The programming

language you need to use is C or C++ (and the language defined by the corresponding tools).

The project includes two phases, lexical analysis, and syntax analysis. In the following, we first

define the language syntax and tokens. The definitions are given in BNF form.

1 Language Definitions

1.1 Syntax Definitions

program ::= { Program program-name function-definitions statements }

program-name ::= identifier

function-definitions ::= (function-definition)*

function-definition ::=

{ Function function-name arguments statements return return-arg }

function-name ::= identifier

arguments ::= (argument)*

argument ::= identifier

return-arg ::= identifier | ?

statements ::= (statement)+

statement ::= assignment-stmt | function-call | if-stmt | while-stmt

assignment-stmt ::= { = identifier parameter }

function-call ::= { function-name parameters } | { predefined-function parameters }

predefined-function ::= + | | * | / | % | print

parameters ::= (parameter)*

parameter ::= function-call | identifier | number | character-string | Boolean

number ::= integer | float

if-stmt ::= { if expression then statements else statements }

while-stmt ::= { while expression do statements }

expression ::= {comparison-operator parameter parameter } |

{Boolean-operator expression expression } | Boolean

comparison-operator ::= == | > | < | >= | <= | !=

Boolean-operator ::= or | and

To make the later references easier, lets call this language the FP language (similar to functional

language syntax, but has the procedural language characteristics).

1.2 Definitions for the Remaining Tokens

An identifier has the form [a-z, A-Z][a-z, A-Z, 0-9]*, but its length has to be between 1 to 6

characters. Confining the number of characters in an identifier is not a good language design

feature, but this is for you to practice a different definition. You have to use regular expression to

achieve the definition of the length constraint (you are not allowed to use the length feature in

lex to confine the length).

An integer can have a negative sign () but no positive sign. It can be with or without blank(s)

between the sign and the digits. If the value is 0, only a single digit 0 is allowed. Otherwise, it is

the same as common integer definitions (no leading 0s).

A float has to have the decimal point and at least one digit on each side. The left side of the

decimal point follows the rule for integers. The right side of the decimal point can be any number

of digits but should have at least one digit.

A character string is enclosed within (). It can be [a-z, A-Z, 0-9, \]+. For example, (I like lex and

yacc) is a character string. We use \ to represent a new line, i.e., in C. The character strings are

mainly used in the print function. We may also define functions that uses character strings as

parameters.

A Boolean can only be T or F, where T represents true and F represents false.

1.3 Definitions of the Predefined Functions

Each of the predefined comparison operations and Boolean operators takes two parameters and

the corresponding functionality follows the conventional definition.

The predefined functions +, *, , /, and % also have the conventional meanings. +, *, , and / can

take two or more parameters. For example, {* A B C} implies A * B * C. Similarly, and / can

also take two or more parameters. For example, {/ A B C} implies A / B / C and { A B C}

implies A B C. The % function can only take two parameters and {% A B} means A % B (A

mod B).

The predefine function print takes one or more parameters. It simply prints the value of each

parameter. Whenever a character string (\) is encountered, a new line should be printed.

Although there are functions that take a fixed number of parameters, it is not your job to check

the number of parameters (not in the lexical and syntax analysis phases). Also, it is not necessary

to check whether the number of arguments and number of parameters do match (not in the

lexical and syntax analysis phases). You only need to follow the general rules defined in the

syntax of the language.

1.4 White Space

To ensure the correctness in processing the input FP-program, you need to consider white space

as well. White space characters include space, tab, and new-line. Treat the white space the same

way as in other high level languages such as C++ and Java (skip white space). Note that space in

character string should not be skipped.

1.5 Some Remarks

Note that we do not consider the language typing rules. Thus, there is no need to predefine

variables and your program does not need to handle mismatched types.

The FP language is case sensitive.

Here is an example program using the FP language.

{Program Sample

{Function facto VAL

{if {< VAL 0 }

then {= retVal -1}

else {= retVal 1}

{while {> VAL 0} do

{= retVal {* retVal VAL}}

{= VAL {- VAL 1}}

}

}

return retVal

}

{print {facto 999}}

}

Project Phase 1 -- Lexical Analysis Using lex

Define the tokens in the FP language in lex definitions and feed it to lex to generate a scanner

(lexer). Then, use the sample program as well as other testing programs written in FP to test your

scanner. Note that it is your responsibility to convert the BNF (or verbal) definitions to lex

definitions. The tokens in FP include:

The set of keywords: Program, Function, return, if, then, else, while, do, or, and, print.

The set of special symbols: {, }, (, ), =, +, , *, /, %, ==, >, <, >=, <=, !=.

The set of other tokens: identifier, integer, float, character-string, Boolean.

It is also your responsibility to insert appropriate statements (actions) in your lex definition file

so that the lexer (created by lex) will generate a symbol table and print the output token names

appropriately. You should design the symbol table such that it contains the necessary fields for

this as well as the future projects.

The token names should be clearly printed, one on each line, in the correct order. The names of

the tokens should conform to those given above. The original symbols for the predefined

functions and the comparison-operators should be used as the token names (such as >=). The

keywords itself should be used as token names (such as while). For the other tokens, the

specific names: identifier, integer, float, character-string, and Boolean should be used. Note that

for the other tokens, you should output the token names as well as the texts you obtained (in

the same line). If you do not keep track of the original texts, such as the identifiers name, the

integers value, the actual character string, etc., then, the information will be lost.

The symbol table should be printed as well. You need to print out the content of the symbol table,

one entry on a line. (In this phase, you only have the identifier names).

You need to electronically submit your program before midnight of the due date. Follow the

instruction below for submission.

Please include the following files in your submission:

?? The lex definition file for FP language. The file name should be FP.l.

?? A file containing a sample program written in FP language that you have tested with the

scanner generated from your lex definition file. The file name should be sample.fp. If you have

multiple sample files, you can name them sample1.fp, sample2.fp, etc.

?? A readme file that explains how to interpret your output from lex (the list of tokens).

Also, if you have some information that you would like the grader know, you can also put it in

this file.

?? The optional Design.doc file that contains the description of some special features of your

project that is not specified in the project specification, including all problems your program may

have and/or all additional features you implemented.

You are responsible for generating your own testing programs and test your project thoroughly.

We may use a different set of FP programs for testing.

If your program does not fully function, please try to have partial results printed clearly to obtain

partial credits. If your program does not work or it does not provide any output, then there will

be no partial credits given. If your program does work fully or partially but we cannot make it

run properly, then it is your responsibility to make an appointment with the grader and come to

demonstrate your program. According to the problems, appropriate point deductions will apply

in these situations.

Code Given for it (Please edit if it is missing something):

#include #include #include #include #include #include #include #include using namespace std;

int words(char buffer[]) { char keywords[35][10] = { "auto","break","case","char","const","continue","default", "do","double","else","enum","extern","float","for","goto", "if","int","long","register","return","short","signed", "sizeof","static","struct","switch","typedef","union", "unsigned","void","volatile","while" };

int map = 0; int x;

for (x = 0; x < 35; ++x) { if (strcmp(keywords[x], buffer) == 0) { map = 1; break; } } return map; }

int main() { char shy; char buffer[20]; char symbol[] = "+ - * / = ?";

ifstream myfile("Lexer.txt");

int x = 0; int y = 0;

vector> v { { "[0-9]+" , "Numbers" } , { "[a-z]+" , "Letters" }, { "\ \ * | \ \ +", "Symbol" } };

while (!myfile.eof()) { shy = myfile.get(); for (int x = 0; x < 10; ++x) { if (shy == symbol[x]) cout << shy << " is an operator"; }

if (isalnum(shy)) { buffer[y++] = shy; }

else if ((shy == ' ' || shy == ' ') && (y != 0)) { buffer[y] = '\0'; y = 0;

if (words(buffer) == 1) cout << buffer << " is an keyword"; else cout << buffer << " is an indentifer"; } } myfile.close(); return 0; }

Output is suppose to do this:

Void is keyword

main is indentifier

int is keyword

+ is operator

etc

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database And Expert Systems Applications Dexa 2023 Workshops 34th International Conference Dexa 2023 Penang Malaysia August 28 30 2023 Proceedings

Authors: Gabriele Kotsis ,A Min Tjoa ,Ismail Khalil ,Bernhard Moser ,Atif Mashkoor ,Johannes Sametinger ,Maqbool Khan

1st Edition

303139688X, 978-3031396885

More Books

Students also viewed these Databases questions