Question

1 Approved Answer

Posted on Sep 09, 2024

1. Problem Description At EMCC, we are building our own computer language called Emcee. As we have learned in class, there are four different levels

1. Problem Description At EMCC, we are building our own computer language called Emcee. As we have learned in class, there are four different levels of analysis: Lexical identifiers, keywords, punctuation, comments, etc. Syntactic the structures within the language (e.g. definitions, if-statements) Contextual static checking of variables, type, name, initialization, etc. Semantic meaning and behavior of a program. When building a compiler for this language, we must account for each level. The first step in compilation is the Lexical Analyzer. It is responsible for reading the input source code file and breaking it up into the various tokens defined by the language. This is your assignment. First, review a description of the Emcee language which is defined in the file Emcee Language Overview.doc. Create a list of all the different tokens you will need for the language. Here is a hint: there are four patterns that need to be recognized (for example, identifiers), seven keywords and 22 punctuation marks. Multiple characters that go together to form a single symbol are considered a single token, for example, := for assignment. Dont forget to recognize comments, new-lines and any other characters and ignore them. If the source code happens to have an unrecognized character (e.g. $), it will simple be ignored. Next, write a program that reads from Standard Input (a UNIX concept) and produces a list of token numbers. If the token recognized is one of the 4 patterns, then also print the pattern that was recognized. You can look at the example output to get a better idea of what is required here. Such a program is called a Lexical Analyzer, or simple a Scanner. Writing a Scanner is not a simple task but programmers have been building them since the dawn of high level languages. Because of that, tools have been developed that will generate the Scanner for you. All you need to do is describe the tokens that the language expects to find and the tool will create the analyzer code. One such Scanner-generator tool is called Lex. Lex was originally written in 1975 and was used on many Unix systems. Lex reads a file that specifies the lexical analyzer and outputs source code implementing the Scanner in the C programming language. Since then, a new open-source tool called Flex ("fast lexical analyzer") has been written and should have been downloaded when you set up your Pi. A good introduction to the Flex tool can be found in the document Lex Overview.doc. To assist in this job, I have provided the starting file for your description of the Emcee tokens. This can be found in emcee.l (Thats an L, not a 1. Lex/Flex source files are typically stored in .l files). Likewise, the Scanner must produce token numbers that match what the next stage, the Parser, will expect. This is defined in a file Ive created for you called tokens.h. This is a C header file that we will learn more about as we learn that language. Do not make any changes to this file. It will be part of your solution. Finally, I have provided the main program (emcee.c) that will read a source file written in Emcee, and repeatedly call your Scanner. Your job is to take the partially implemented emcee.l and complete it. Put all your files into a single directory (like ~/projects/emcee maybe?). From the command line, change into this directory and execute the following steps to test your program: 1. flex emcee.l This will invoke the flex tool to process your token definitions and produce a file called lex.yy.c. You can take a look at this file though it may not make much sense. It is written in C and was not meant for humans to look at. Once you have removed all the errors, you can go on to the next step. 2. gcc lex.yy.c emcee.c -o emcee This will compile your new Scanner and the driver program. If all goes well it will produce an executable file called emcee. Once you have removed all the errors, you can go on to the next step. 3. ./emcee < sqrt.mc > tokens.out This will run your compiler using the input file sqrt.mc and direct the output to the file tokens.out. The file sqrt.mc is provided on Canvas. The contents of this file should look like the sample output below. Test your program with other source files as well. You should make sure each of your tokens is picked up correctly. At this point, syntax doesnt matter. 4. cat tokens.out This will print out the file your computer created, tokens.out 2. Notes You must use flex. Dont write your own Scanner. Turn in only your completed emcee.l file. 3. Required Input Lines of emcee code. 4. Required Output Your output should look something like the following example. Emcee Compiler ver. 0.1 1 9 (guess) 13 3 12 1 9 (n) 13 3 12 9 (guess) 28 8 (1.0) 12 9 (n) 28 9 (inputReal) 14 15 12 5 29 9 (n) 27 8 (0.0) 30 6 31 7 29 9 (abs) 14 9 (guess) 20 10 (2) 17 9 (n) 15 25 8 (0.01) 30 31 9 (guess) 28 14 9 (n) 16 14 9 (n) 19 9 (guess) 15 15 19 10 (2) 12 32 12 9 (print) 14 11 ("Root is ") 33 9 (guess) 15 12 32 12