Question

1 Approved Answer

Posted on Sep 22, 2024

Modern translation systems often use lexical analysis to divide an input into meaningful units. Oncethesemeaningfulunits(orlexemesortokens)havebeenderived, othercomponentswithin the translation system are used to determine the relationships

Modern translation systems often use lexical analysis to divide an input into meaningful units. Oncethesemeaningfulunits(orlexemesortokens)havebeenderived, othercomponentswithin the translation system are used to determine the relationships among the lexemes. Lexical analyzers (or lexers) are commonly used in compilers, interpreters, and other translation systems that you have often used. The act of lexical analysis is also known as scanning. For this assignment you are to build a lexer that will successfully scan through a set of programs expressed in the CCX programming language. You have never used CCX, and that is just ne: scanning through CCX programs wont require intimate knowledge of CCX. Your lexer shall be written in the C programming language. You may not use C++. Your lexer will be compiled and tested using gcc on the wormulon server, so you should at least test your lexerinthissameenvironment. Youareexpresslyforbiddenfromusinganystandardtemplate library (STL) containers (e.g. ), algorithms (via ), or other facilities in your lexer. You may not use any lexer generators or other automated tools to complete this assignment. The goal of lexical analysis is to break programs like t into lexemes that are eventually used by other components within the translation system to determine things like whether the program is legal and what the program does. Although the requirements and constraintsimposeduponlexicalanalysismayvaryconsiderablybetweendifferenttranslation systems, the requirements for most lexers (and for this assignment) are very simple. Your lexer shall open a le provided on the command-line, discover the lexemes found in the le, classify each lexeme, and print out each lexeme and its classication to the computer screen. Your lexer shall classify each lexeme found in a given source le into one of 8 categories. These categories are: comment, string, keyword, character literal, numeric literal, operator, identifier, and UNK. The details concerning each of these categories is specied later in this document, Please examine this output closely. Each lexeme must be printed on a separate line, and a singlespacemustappearbetweenthelexemeanditsclassication. Thelexemeitselfmuststart in the rst column on a given line. The classication of a lexeme must appear in parentheses. No blank or empty lines can appear in the output unless they are part of a multi-line comment. Each lexeme must be printed precisely as it appears in the source le. Do not bracket the lexeme in quotes or any other characters he Lexeme Categories As mentioned, your lexer shall classify each lexeme encountered into one of 8 categories. The details of each category follow. comment Comments in CCX begin with / and end with / (C-style comments). Comments can span multiple lines. Everything encountered between (and including) the / and / delimiters is considered part of the comment lexeme. identifier Identiers are used in programs to name entities such as variables. Every programming language has its own rules as to what constitutes a legal identier. In CCX an identier can be composed of letters, digits, and underscores, but must start with a letter. You may assume that your lexer will never encounter an identier that is more than 256 characters long. string Strings in CCX are literals delimited by double-quotes like this. The double-quotes are part of the lexeme. When you print a lexeme that has been classied as a string, you must print the double-quotes. You may assume that your lexer will never encounter a string that is more than 256 characters long. keyword CCX contains many keywords. Keywords are sometimes called reserved words. Keywords (like all of CCX) are case-sensitive, and may not be used as identiers in legal programs. It is not the job of the lexer to determine whether a keyword is misused; the lexer simply classies a particular lexeme as being a keyword. The following are the list of CCX keywords that your lexer must recognize: accessor and array begin bool case character constant else elsif end exit function if in integer interface is loop module mutator natural null of or others out positive procedure range return struct subtype then type when while character literal Character literals in CCX are literals in single-quotes like this: x. CCX allows character escape sequences in character literals, such as \020 but your lexer need not support this. operator CCX contains many operators. Some operators consist of a single character, whereas others contain multiple characters. The following is a list of the operators that your lexer must recognize. Each operator is enclosed in double-quotes for the purpose of disambiguation, but these double-quotes are not part of the operator: "." "<" ">" "(" ")" "+" "-" "*" "/" "|" "&" ";" "," ":" "[" "]" "=" ":=" ".." "<<" ">>" "<>" "<=" ">=" "**" "!=" "=>" numeric literal CCX allows numeric literals in multiple forms. Your lexer will recognize a simplied subset of CCX numeric literals. Each numeric literal encountered by your lexer will start with a decimal digit and will contain only the following: decimal digits (0 through 9) hexadecimal digits (A through F and/or a through f) the special characters , ., and #. any other character encountered will denote that the numeric literal has ended and a new lexeme has begun. UNK This special category is set aside for lexemes that your lexer cannot classify, and is intended to assist you in building and debugging your lexer. This category is composed of all lexemes that do not t in any of the other specied categories. Your lexer will only be tested against legal CCX programs, so if the logic in your lexer is correct, you should never encounter an UNK lexeme. If, however, your lexer does encounter a lexeme that does not t the requirements of any of the other categories, your lexer must print the offending lexeme, along with its category name in parenthesis, and immediately terminate. Each of these source les and the result produced by a correct lexer when scanning the le is available on the course website. The output of your lexer will be compared with the correct result for each le. If your lexer does not compile using gcc on wormulon for any reason, at least 50% of the total possible points on this assignment will be deducted from your score. You are required to do your own work for this assignment. Failure to comply will result in at least a score of 0 for this homework. Your lexer must take its arguments from the command-line. If, for instance, the prompt on a given machine is -bash-4.1$ your lexer will be executed as follows: -bash-4.1$ ./hw1 hello_world.ccx -bash-4.1$ ./hw1 list.cci ... Your program may not ask the user for the name of a le to scan, or the number of les to scan, or anything of this sort. If your lexer does not process command-line arguments for any reason, at least 20% of the total possible points on this assignment will be deducted from your score. Your program will be evaluated for neatness and clarity as well as correctness. You must document your program using comments. You must use a consistent programming style which indicates that you are in control of your thoughts and the program which is being used to actualize them. Sloppy, ambiguous, convoluted, intentionally vague, undocumented, or insufciently documented programs will be considered substandard and will be marked as such. Hints Your lexer will only be tested against the source les listed above. Each of these source les is legal CCX. Your lexer is not expected to be bullet proof, so dont spend time trying to handle the rather large set of all legal and illegal CCX programs. You would do well to think of your lexer as a state machine that operates on a character-bycharacter basis. The set of states in such a machine should be relatively small. Build your lexer incrementally. For example, start off building a lexer that recognizes just keywords, and defaults to the UNK state. Then create a le containing just the CCX keywords, and test your lexer. Once it has been tested, add the ability to recognize CCX operators. Then create a le containing just keywords and operators and test your lexer. Continue in this fashion until your lexer is complete. Your must be submitted using the cscheckin program. A document detailing the use of cscheckin appears on the course website. You must turn in only the source code for your lexer. If your lexer is composed of multiple source code les, you should package them into a single archive using the tar program on wormulon