Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Startcode: # introducing some constants; can also use the enum type if Python 3.4 # is available INT, FLOAT, ID, SEMICOLON, ASSIGNMENTOP, EOI, INVALID =

image text in transcribed
image text in transcribed
image text in transcribed
Startcode:
# introducing some constants; can also use the enum type if Python 3.4
# is available
INT, FLOAT, ID, SEMICOLON, ASSIGNMENTOP, EOI, INVALID = 1, 2, 3, 4, 5, 6, 7
def typeToString (tp):
if (tp == INT): return "Int"
elif (tp == FLOAT): return "Float"
elif (tp == ID): return "ID"
elif (tp == SEMICOLON): return "Semicolon"
elif (tp == ASSIGNMENTOP): return "AssignmentOp"
elif (tp == EOI): return "EOI"
return "Invalid"
class Token:
"A class for representing Tokens"
# a Token object has two fields: the token's type and its value
def __init__ (self, tokenType, tokenVal):
self.type = tokenType
self.val = tokenVal
def getTokenType(self):
return self.type
def getTokenValue(self):
return self.val
def __repr__(self):
if (self.type in [INT, FLOAT, ID]):
return self.val
elif (self.type == SEMICOLON):
return ";"
elif (self.type == ASSIGNMENTOP):
return ":="
elif (self.type == EOI):
return ""
else:
return "invalid"
LETTERS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
DIGITS = "0123456789"
class Lexer:
# stmt is the current statement to perform the lexing;
# index is the index of the next char in the statement
def __init__ (self, s):
self.stmt = s
self.index = 0
self.nextChar()
def nextToken(self):
while True:
if self.ch.isalpha(): # is a letter
id = self.consumeChars(LETTERS+DIGITS)
return Token(ID, id)
elif self.ch.isdigit():
num = self.consumeChars(DIGITS)
if self.ch != ".":
return Token(INT, num)
num += self.ch
self.nextChar()
if self.ch.isdigit():
num += self.consumeChars(DIGITS)
return Token(FLOAT, num)
else: return Token(INVALID, num)
elif self.ch==' ': self.nextChar()
elif self.ch==';':
self.nextChar()
return Token(SEMICOLON, "")
elif self.ch==':':
if self.checkChar("="):
return Token(ASSIGNMENTOP, "")
else: return Token(INVALID, "")
elif self.ch=='$':
return Token(EOI,"")
else:
self.nextChar()
return Token(INVALID, self.ch)
def nextChar(self):
self.ch = self.stmt[self.index]
self.index = self.index + 1
def consumeChars (self, charSet):
r = self.ch
self.nextChar()
while (self.ch in charSet):
r = r + self.ch
self.nextChar()
return r
def checkChar(self, c):
self.nextChar()
if (self.ch==c):
self.nextChar()
return True
else: return False
import sys
class Parser:
def __init__(self, s):
self.lexer = Lexer(s+"$")
self.token = self.lexer.nextToken()
def run(self):
self.statement()
def statement(self):
print ""
self.assignmentStmt()
while self.token.getTokenType() == SEMICOLON:
print "\t;"
self.token = self.lexer.nextToken()
self.assignmentStmt()
self.match(EOI)
print ""
def assignmentStmt(self):
print "\t"
val = self.match(ID)
print "\t\t" + val + ""
self.match(ASSIGNMENTOP)
print "\t\t:="
self.expression()
print "\t"
def expression(self):
if self.token.getTokenType() == ID:
print "\t\t" + self.token.getTokenValue() \
+ ""
elif self.token.getTokenType() == INT:
print "\t\t" + self.token.getTokenValue() + ""
elif self.token.getTokenType() == FLOAT:
print "\t\t" + self.token.getTokenValue() + ""
else:
print "Syntax error: expecting an ID, an int, or a float" \
+ "; saw:" \
+ typeToString(self.token.getTokenType())
sys.exit(1)
self.token = self.lexer.nextToken()
def match (self, tp):
val = self.token.getTokenValue()
if (self.token.getTokenType() == tp):
self.token = self.lexer.nextToken()
else: self.error(tp)
return val
def error(self, tp):
print "Syntax error: expecting: " + typeToString(tp) \
+ "; saw: " + typeToString(self.token.getTokenType())
sys.exit(1)
print "Testing the lexer: test 1"
lex = Lexer ("x := 1 $")
tk = lex.nextToken()
while (tk.getTokenType() != EOI):
print tk
tk = lex.nextToken()
print
print "Testing the lexer: test 2"
lex = Lexer ("x := 1; y := 2.3; z := x $")
tk = lex.nextToken()
while (tk.getTokenType() != EOI):
print tk
tk = lex.nextToken()
print
print "Testing the lexer: test 3"
lex = Lexer ("x := 1; y : 2; z := x $")
tk = lex.nextToken()
while (tk.getTokenType() != EOI):
print tk
tk = lex.nextToken()
print
print "Testing the parser: test 1"
parser = Parser ("x := 1");
parser.run();
print "Testing the parser: test 2"
parser = Parser ("x := 1; y := 2.3; z := x");
parser.run();
print "Testing the parser: test 3"
parser = Parser ("x := 1; y ; 2; z := x");
parser.run();
Your assignment is to use Python to write a recursive descent parser for a restricted form of SQL, the popular database query language. The lexical syntax is specified by regular expressions: Category Definition [0-9] [a-zA-Z int + float + id ( )* keyword SELECT FROM 1 WHERE | AND operator = comma The concrete syntax is specified by the following E-BNF grammar, where is the start symbol: KQuery> -> SELECT FROM (WHERE ] -> {. } -> {AND } -> -> | | Here , are token categories defined by the regular expressions above, and the terminal symbols SELECT, FROM WHERE, And AND are of category keyword, while "" is a terminal symbol with category comma. Note arbitrary whitespace can appear between tokens. and no space is needed between an operator and its operands, or between the items in a list and the commas that separate them. This project is broken down into the following series of tasks. 1. (3 points) Write a class Token that can store the value and category of any token. 2. (7 points) Develop a lexical analyzer. You should start by drawing on paper a finite state automaton that can recognize types of tokens and then convert the automaton into code. Name this class Lexer and ensure that it has a method nextToken() which returns a Token. Lexer should have a constructor with the signature" init.. (self, s)", where s is the string representing the query to be parsed. As mentioned before, the code for recognizing an identifier should check to see if the terminal string is a keyword, and if so, then the category of the token should be set to keyword, instead of ID. If nextToken() is called when no input is left, then it should return a token with category EO (EndOfInput). If the next terminal string is not considered a token by the lexical syntax, then return a token with category Invalid. 3. (10 points) Write a syntactic analyzer which parses the tokens given by Lexer using the recursive descent technique. Name this class Parser. Like Lexer, Parser should have a public constructor with the signature "init... (self, s)". This input string should be used to create a Lexer object. There should also be a method "run(self)" that will start the parse. As the program parses the input, Parser should print out the nonterminal that was matched, surrounded by angle-brackets, e.g. ry>. Whenever it applies a new rule, it should indent by a tab. When it has parsed all tokens that constitute a nonterminal, it should print out the nonterminal's name, preceded by a slash, and surrounded by angle-brackets, e.g. . When a terminal symbol (token) is matched, it should output the token category, surrounded by angle- brackets, the tokens string, and the token category preceded by a slash and surrounded by angle-brackets (e.g. 42). Whenever the parser comes across a token that does not fit the grammar, it should output a message of the form "Syntax error: expecting expected-token category: saw token" and immediately exit. Note, the input string is expected to contain exactly one sentence of the form Query. In order to test your file, you will have to create a test that creates a Parser object then calls the run() method on the object. For example, the code: parser - Parser ("SELECT C1,C2 FROM T1 WHERE C1-5.23") parser.run should have the output given in Figure 1. Although it is not important for completing this assignment, you may be interested to know that this output format is well-formed XML. SELECT C1 , C2 FROM T1 WHERE C1 = 5.23 Figure 1: Sample output. Your assignment is to use Python to write a recursive descent parser for a restricted form of SQL, the popular database query language. The lexical syntax is specified by regular expressions: Category Definition [0-9] [a-zA-Z int + float + id ( )* keyword SELECT FROM 1 WHERE | AND operator = comma The concrete syntax is specified by the following E-BNF grammar, where is the start symbol: KQuery> -> SELECT FROM (WHERE ] -> {. } -> {AND } -> -> | | Here , are token categories defined by the regular expressions above, and the terminal symbols SELECT, FROM WHERE, And AND are of category keyword, while "" is a terminal symbol with category comma. Note arbitrary whitespace can appear between tokens. and no space is needed between an operator and its operands, or between the items in a list and the commas that separate them. This project is broken down into the following series of tasks. 1. (3 points) Write a class Token that can store the value and category of any token. 2. (7 points) Develop a lexical analyzer. You should start by drawing on paper a finite state automaton that can recognize types of tokens and then convert the automaton into code. Name this class Lexer and ensure that it has a method nextToken() which returns a Token. Lexer should have a constructor with the signature" init.. (self, s)", where s is the string representing the query to be parsed. As mentioned before, the code for recognizing an identifier should check to see if the terminal string is a keyword, and if so, then the category of the token should be set to keyword, instead of ID. If nextToken() is called when no input is left, then it should return a token with category EO (EndOfInput). If the next terminal string is not considered a token by the lexical syntax, then return a token with category Invalid. 3. (10 points) Write a syntactic analyzer which parses the tokens given by Lexer using the recursive descent technique. Name this class Parser. Like Lexer, Parser should have a public constructor with the signature "init... (self, s)". This input string should be used to create a Lexer object. There should also be a method "run(self)" that will start the parse. As the program parses the input, Parser should print out the nonterminal that was matched, surrounded by angle-brackets, e.g. ry>. Whenever it applies a new rule, it should indent by a tab. When it has parsed all tokens that constitute a nonterminal, it should print out the nonterminal's name, preceded by a slash, and surrounded by angle-brackets, e.g. . When a terminal symbol (token) is matched, it should output the token category, surrounded by angle- brackets, the tokens string, and the token category preceded by a slash and surrounded by angle-brackets (e.g. 42). Whenever the parser comes across a token that does not fit the grammar, it should output a message of the form "Syntax error: expecting expected-token category: saw token" and immediately exit. Note, the input string is expected to contain exactly one sentence of the form Query. In order to test your file, you will have to create a test that creates a Parser object then calls the run() method on the object. For example, the code: parser - Parser ("SELECT C1,C2 FROM T1 WHERE C1-5.23") parser.run should have the output given in Figure 1. Although it is not important for completing this assignment, you may be interested to know that this output format is well-formed XML. SELECT C1 , C2 FROM T1 WHERE C1 = 5.23 Figure 1: Sample output

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intranet And Web Databases For Dummies

Authors: Paul Litwin

1st Edition

0764502212, 9780764502217

More Books

Students also viewed these Databases questions