Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 11, 2023

Goal: The goal of this part of the project is to implement in Java or Python, a Tokenizer for the Core language. Although it

Goal: The goal of this part of the project is to implement in Java or Python, a Tokenizer for the Core language. Although it is called a tokenizer, as we saw in class, what you will implement is a Scanner which will read one line at a time from the input file, tokenize that line, and make those tokens available to the parser via the methods listed below. Once those tokens have all been consumed by the parser, the Scanner will read the next line, tokenize that line, etc. The approach used for tokenization in your Scanner should be based on the DFA for Core as we have discussed in class. The methods that the Scanner should provide are dictated by the needs of the Parser, in particular the One- Token-Look-Ahead (OTLA) approach used by the Parser. OTLA means that there are times when some method in the parser needs to know what the next token is although it is not yet ready to "consume" it. We will see the details when we discuss the parser. In these situations, if the next token is an identifier, we do not need to know the name of the identifier, only the fact that the next token is an identifier. Similarly, if the next token is an integer, we do not need to know the actual value of the integer, only the fact that the next token is an integer. But when the Parser is ready to consume the identifier or the integer token, it will need the name of the particular identifier or the value of the particular integer. The methods listed below, to be implemented in your Core Scanner, are designed to account for these factors. In addition, they are designed so that the grader can easily grade your Scanner implementation. Of course, in addition to these methods which constitute the interface of the Scanner, you may always implement any private methods you want inside the Scanner. The set of tokens of Core are as specified on slide 14 of the file part2.pptx. As listed on slide 14, the legal tokens of the Core language are: Reserved words (11 reserved words): program, begin, end, int, if, then, else, while, loop, read, write Special symbols (19 special symbols): = ! [ ] && || ( ) + * ! = Integers (unsigned) Identifiers: start with uppercase letter, followed by zero or more uppercase letters and zero or more digits. Note something like "ABCend" is illegal as an id because of the lowercase letters; and it is not two tokens because of lack of whitespace. But ABC123 and A1B2C3 are both legal. White space requirement: White space, i.e., one or more blanks or tab characters or carriage returns, is required between any pair of tokens unless one or both of them is/are special symbols. If one or both of them is/are special symbols. white space is optional. Note that white space is not a token. As we discussed in class, we will number these tokens 1 through 11 for the reserved words, 12 through 30 for the special symbols, 31 for integer, and 32 for identifier. Note that 31 just tells us that the token in 1 question is an integer, not the actual value of the integer. Similarly, 32 just tells us that the particular token is an identifier, not the name of the identifier. We will also use two additional numbers, 33 and 34, to indicate not actual tokens but situations that the Scanner has to deal with. We will use 33 to indicate that the Scanner is at the end-of-file so there are no more tokens. And 34 will indicate that the Scanner came across a string of characters in the input stream that is not a legal token, in other words, an error token. The Scanner's interface should consist of the following (public) methods: Constructor (): The Scanner's constructor should receive, as parameter, the name of the input file that contains the string of characters to be tokenized. The constructor should start by instantiating a bufferedreader corresponding to that file in the usual manner. The constructor should then call a private method tokenizeLine (). This method will read a line from the inpuut file and convert the sequence of characters into a sequence of Core tokens and save them in a private data structure, possibly an array. It should then set a cursor index that points to the first token in that structure. It would be useful to include, in the same structure, the string of characters making up each token since you will need that for the case of integer tokens and identifier tokens. If a line that tokenizeLine () reads consists entirely of white space characters, it should skip that line and read the next line skipping as many lines as necessary to get to a non-blank line. If the line contains a string of characters that is not a valid token, tokenizeLine () should still tokenize up to that point with the next number in its private data structure being 34 to indicate illegal token. tokenizeLine () also has to take account of greedy tokenizing. int getToken (): This returns a number between 1 and 32 if the current token is a proper Core token; 33 if the Scanner is at the end of the file and there are no more tokens; 34 if the Scanner comes across a string that is not a legal token. Note that getToken () does not move the "cursor" forward. So repeated calls to get Token () will return the same value. To move past the current token, the Parser has to call skipToken (). void skipToken (): This method moves the "token cursor" to the next token (but does not return anything). If there are no more tokens in the current line, skipToken () calls tokenizeLine() which will read the next line from the input file and convert the sequence of characters into a sequence of Core tokens and save them in the Scanner's data structure and set the cursor index to point to the first token and return. If, when skipToken () is called, the current token is 33, i.e., we are already at the end-of-file, the cursor is not moved since there are no more lines or tokens to read. Similarly, if the current token is 34, i.e., we have an illegal token, the cursor is not moved since we do not go past an illlegal token. int intVal (): If get Token () returns 31 to indicate that the current token is an integer, we may call intVal () to get the value of that integer. If we call intVal () when the current token is not an integer, it will print an error message and the program will terminate. string idName (): If get Token () returns 32 to indicate that the current token is an identifier, we may call idName () to get the name of that identifier. If we call idName () when the current to- ken is not an identifier, it will print an error message and the program will terminate. Both intVal () and idName () will make use of the 2 Using the DFA: Where does the DFA for Core tokens enter the picture? Standard tokenizers use a table-driven approach in which the DFA, including all the transitions, indications of which states are accepting final states and what tokens they correspond to, etc., are represented in a table. Greedy tokenizing is also represented in the table. The scanner itself is written as a driver program that uses the table to decide what to do when it is in any particular state and for any given character that might be next on the input stream. But we didn't discuss this in class so you do not need to follow this approach. The book does contain a brief description, in Section 2.2.3, of the approach and it also includes pseudo-code corresponding to the approach. I would recommend reading that before deciding whether to follow this approach in your implementation or do it some other way. If you decide to follow this approach, you can simply convert the book's pseudo-code into Python or Java, depending on which one you are using; you would also have to come up with the table corresponding to your DFA for Core. Your main () function should receive the name of the input file as a command line argument. It should then call the constructor of Scanner, passing that name as an argument. main () should then repeatedly call get Token (), print the returned token number, and call skipToken () until, after some number of iterations, get Token () returns 33 to indicate end-of-file or 34 to indicate invalid token. In either case, it should print an appropriate message and terminate. main () should print each token number on a separate line. Note that main () never calls intVal () or idName (). These methods will be used in the parser. Suppose the input file contains the following Core program: program int X; begin X=25; write X; end your main () function should produce the following sequence of numbers: 1 4 32 12 2 32 ... 33 corresponding to the tokens, "program", "int", "X", ";", "begin", "X", ..., EOF, but with each number being on a separate line. Note that, although the example above is a legal Core program, the Scanner should not worry about whether the input stream is a legal Core program or not. That will be the job of the parser. All that your Scanner should care about is that each token in the input stream is a legal token. Of course, if the Scanner comes across an illegal token in the input stream, as described above, it should print an appropriate error message and stop. It should not try to continue beyond that point. Additional Notes: 1. The tokens must be read line-by-line and printed. You should not read the entire stream of tokens in one fell-swoop before starting to print them. If there is an error in reading a token, such as a misspelled keyword, all the tokens prior to that token must be printed out before the error message for the bad token is printed. Then the Tokenizer must terminate. 2. You may use either Java or Python. Do NOT use any other language. Do NOT use the Tokenizer libraries of those languages or others that you may find online. Do NOT use any regular expression package that may be available either as part of the standard libraries or that you might come across online. 3. Submit a .zip file on Carmen that includes the following: 3

Step by Step Solution

★★★★★

3.34 Rating (148 Votes )

There are 3 Steps involved in it

Step: 1

Answer import javaio public class Scanner private BufferedReader reader private String tokens privat... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Systems Analysis And Design

Authors: Joseph Valacich, Joey George

8th Edition

0134204921, 978-0134204925

More Books

Students also viewed these Programming questions

Question

Ernie Hasana is self - employed as a virtual assistant. He will use the single filing status, and his only source of income was a net profit of $ 9 6 , 0 0 0 from his business. Ernie's partially...

Answered: 1 week ago

Question

NATURAL CEREALS SITUATION II The Meeting Fortunately, Sally and Joe were able to schedule a meeting with Tom for that afternoon. As they walk into Toms office, Sally feels a little uneasy. She...

Answered: 1 week ago

Question

★★★★★

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Answered: 1 week ago

Question

★★★★★

i. Find a. b. ii. Use the trapezium rule with 2 intervals to estimate the value of giving your answer correct to 2 decimal places. +6 e2x + 6 e2 dx,

Answered: 1 week ago

Question

★★★★★

Kroger, Safeway Inc., and Winn-Dixie Stores Inc. are three grocery chains in the United States. Inventory management is an important aspect of the grocery retail business. Recent balance sheets for...

Answered: 1 week ago

Question

★★★★★

Analyze the level of task variety and task analyzabil- ity associated with organizational and departmen- tal tasks. Are there ways to reduce task variability or increase task analyzability to...

Answered: 1 week ago

Question

★★★★★

Date decision to be made (if known)

Answered: 1 week ago

Question

★★★★★

According to popcorn.org, the mean consumption of popcorn annually by Americans is 54 quarts. The marketing division of popcorn.org unleashes an aggressive campaign designed to get Americans to...

Answered: 1 week ago

Question

★★★★★

The Beta Corporation owns a building with a basis of $20,000 that is subject to a debt of $80,000. The FMV of the building is $50,000. Beta distributes the property in a nonliquidating distribution...

Answered: 1 week ago

Question

★★★★★

1. Fill the missing values in the table and show the calculations. 2. Provide a detailed description of the Fresnel Zone and how it is affecting the condition of the link between the two sites. 3....

Answered: 1 week ago

Question

★★★★★

3. Price the following bonds at discount rates of 10% and 15%. Each bond's face/par value is $1,000. a. 1 year, 10% coupon bond b. 30 year, 10% coupon bond c. 2 year, zero coupon bond d. 2 year, 20%...

Answered: 1 week ago

Question

★★★★★

In the document the Irish Building Regulations TGD Part L What limitations does the document have

Answered: 1 week ago

Question

★★★★★

don't copy paste previous answers i am already report that answers Noah and Kelly know that small businesses need to have a presence on the web to attract customers. However, they are not sure how...

Answered: 1 week ago

Question

★★★★★

1. (20 points) Consider the dataset given in the table below. The data corre- sponds to US national unemployment rates for males (y;) and females (x). Males Females 2.9 4 7.7 7.4 4.9 5 7.9 7.2 9.8...

Answered: 1 week ago

Question

★★★★★

BiggerBetter Widgets Co. makes World-Class Widgets using a relatively simple process (see flow chart below) with the following parameters. Their factory works from 9 AM to 5 PM; all operations shut...

Answered: 1 week ago

Question

★★★★★

You invest $138 today. After 11 years will you have $229. What simple interest rate did you earn on your investment? Answer as a decimal. Round to the nearest ten thousandth.

Answered: 1 week ago

Question

★★★★★

c. You purchased 100 shares of common stock on margin at $55 per share. Assume that the initial margin is 60%, with the maintenance margin of 35%. If the price moves to $40 per share, what would be...

Answered: 1 week ago

Question

★★★★★

Explain the operation of the dividends received deduction.

Answered: 1 week ago

Question

★★★★★

In this chapter, file servers were described as one way of providing information to users of a distributed information system. What file servers are available, and what are their relative strengths,...

Answered: 1 week ago

Question

★★★★★

Suppose you were responsible for establishing a training program for users of Hoosier Burgers inventory control system (described in previous chapters). Which forms of training would you use? Why?

Answered: 1 week ago

Question

★★★★★

What managerial issues can be better understood by measuring maintenance effectiveness?

Answered: 1 week ago

Question

★★★★★

One of the music industrys most pressing questions is: Can paid download stores contend nose-to-nose with free peer-to-peer download services? Data gathered over the last 12months show Apples iTunes...

Answered: 1 week ago

Question

★★★★★

The null and alternate hypotheses are: H0: 1 = 2 H1: 1 > 2 A random sample of 20 items from the first population showed a mean of 100 and a standard deviation of 15. A sample of 16 items for the...

Answered: 1 week ago

Question

★★★★★

Listed below are the 25 players on the opening-day roster of the 2013 New York Yankees Major League Baseball team, their salaries, and fielding positions. Sort the players into two groups, pitchers...

Answered: 1 week ago

Previous Question Next Question