Question

1 Approved Answer

Posted on Sep 12, 2024

Please code in C for this, this is everything that was given When executed with a mode of 1 , your compiler should read the

Please code in C for this, this is everything that was given

image text in transcribed

When executed with a mode of 1 , your compiler should read the specified input file and divide it into tokens; essentially this is the lexical analysis phase of the compiler (referred to as the "lexer"). The output stream should contain a line for each token, showing the file name, line number, and text corresponding to the token. More details (including the precise output and error formats, and the definitions for the tokens) are given below. The input file may be a C header file, a C source file, or an arbitrary text file. There are opportunities for extra credit. All implementation must be C,C++, Java, or Python. Instructor permission is required to use anything beyond the standard libraries for these languages. Submissions will be graded on pyrite.cs.iastate.edu, and therefore must build and run correctly there. Students are strongly encouraged to encapsulate the functionality of this phase of the compiler, so that later parts of the project can easily examine and consume tokens from an input stream. 2 Tokens in our subset of C Your lexer must recognize the tokens given in Table 1. A useful list of integer constants for tokens may be found in tokens.h, distributed with the materials for this part of the project. Note that single-character symbols use their ASCII codes; all other tokens have integer values above 300 . Whitespace (spaces, tabs, and carriage returns) serves only to separate tokens, and should otherwise be discarded. Any characters that are not part of a lexeme, such as $, should generate an error message. You may notice that some of the C keywords and operators are missing. Students are welcome to implement additional language features if desired, but any extra keywords or operators must be part ot the C standard. 2.1 C style comments C-style comments, beginning with / and ending with /, should be discarded and treated the same as a space. A comment with no matching */ is closed at the end of file, and an error message (indicating the starting point of the comment) should be displayed for this case. The formal rule for C-style comments is: When not currently inside a comment or string literal, the text /* indicates the start of a comment, and all text is ignored until the next */ or the end of file. 2.2C++ style comments C++-style comments, beginning with // and ending with a newline or end of file, should be discarded and treated the same as a newline character. No error message is necessary if the comment is terminated by the end of file. The formal rule for C++-style comments is: When not currently inside a comment or string literal, the text // indicates the start of a comment, and all text is ignored until the next newline character or the end of file. Single character symbols Operators with two characters Keywords \begin{aligned} Value & Lexeme \\ \hline 406 & if \\ 407 & else \\ 408 & break \\ 409 & continue \end{aligned} Lexemes with attributes 2.3 Literals 'The token definitions for integers, reals, characters, and strings are simpler than the C standard. Some extensions to these may be implemented for extra credit; see below. 3 Formatting 3.1 Compiler output For mode 1, the output stream should contain a line for each token, exactly of the form File filename Line line number Token token number Text lexeme where 2 - filename is the name of the input file that contains the token, - line number is the line number of the input file that contains the token, - token number is the integer corresponding to the token, given in tokens. h and 'lable 1 , and - lexeme is the matched text that produced the token. Nothing else should be written to the output stream. The output stream should be written to a file, based on the input file name given on the command line, with extension replaced by lexer. For example, an input file named infile.c or infile.h should produce an output file named infile.lexer in the current working directory. 3.2 Error messages Error messages generated by the lexer must be written to standard error, and have exactly the form Lexer error in file filename line line number at text lexeme Description The Description should appear on the next line(s) and give a helpful description of the error. The error stream must be empty if no errors are generated. Students are strongly encouraged to define their own switches to display any additional information, to help with debugging. Example inputs, outputs, and errors are given below. Additionally, students are encouraged to check the output and error messages given for the example test files, for all parts of the project. For test files with errors, code is considered correct if the first error message occurs at the expected input line and text, and the error messages describe the same error (as determined by a human). 4 Extra features 4.1 Character literals In addition to ordinary character literals, e.g., ' ', your lexer should allow the escape sequences ' \a, ' \b,, ' , ', ' \t ', ' \\ ', and ' \ '. 4.2 Real literals In addition to numbers with fractional portions, e.g., 3.14159, your lexer should allow exponents, of the form e or E followed by an integer with optional sign. If an exponent is present, then the fractional part is optional. For example, the following should be recognized as real literals: 31.415926.02214e231.60217653E191e+100 4.3 String literals Allow escape sequences \\ and \", in addition to the others, inside string literals. For example, the following should be recognized as string literals: "Hello, world! n "I said, \"Hello, world!\"" "This is evil \\" 4.4 Size limits for lexemes Your lexer should generate an appropriate error message (and terminate, if you wish) if the following size limits are exceeded. - Integer literals: 48 characters - Real literals: 48 characters - Identifiers: 48 characters - String literals: 1024 characters Note that there is no limit for the length of a line, the length of a comment, or the length of an input file. 4.5 Output file on error For efficiency, your lexer should make one pass through the input file, generating output for each token as it is encountered. If any error message is generated, then your lexer should remove the output file (it will be ignored anyway during grading). 4.6 \#include directives The include directive has the form \# include filename where filename is a string literal. We will only test for include files in double quotes. In other words, we will test for directives of the form # include "stack.h" \# include "../mylib/mylib.h" \#include"/usr/include/gcc/includes/stdio.h" where the files are either absolute pathnames or pathnames relative to the current working directory. We will not test for "system" includes of the form \#include stdio.h> although you are free to implement that if you wish. Each include directive switches the token input stream to the indicated file, with an appropriate error message generated if the file cannot be opened. When the token stream from that file is exhausted, it reverts back to the original file. Of course, the included file may itself have include directives, which should be followed recursively. You may set a reasonable limit to the include depth (say, 256) with an appropriate error message if this depth is exceeded. 5 Basic example 5.1 Input: hello. c 1234567//Theusualhelloworldprogram,butwithout#ncludebecausetheincludedirectiveisoptional. 4 891011intmain(){returnprintf("Hello,world! );}