Question
Purpose Warm-up with Java programming. Get familiar with regular expression. Understand the wide application of regular expression. Assignment specification Your job is to count the
Purpose
Warm-up with Java programming. Get familiar with regular expression. Understand the wide application of regular expression.
Assignment specification
Your job is to count the number of identifiers in programs written in our Tiny language.
The Tiny Language Definition
Here is the definition for the Tiny language.
The lexicon of the Tiny language is defined as follows:
- Keywords: WRITE READ IF ELSE RETURN BEGIN END MAIN STRING INT REAL
- Single-character separators: ; , ( )
- Single-character operators: + - * /
- Multi-character operators: := == !=
- Identifier: An identifier consists of a letter followed by any number of letters or digits. The following are examples of identifiers: x, x2, xx2, x2x, End, END2.Note that End is an identifier while END is a keyword. The following are not identifiers:
- IF, WRITE, READ, .... (keywords are not counted as identifiers)
- 2x (identifier can not start with a digit)
- Strings in comments are not identifiers.
- Number is a sequence of digits, or a sequence of digits followed by a dot, and followed by digits.
Number -> Digits | Digits '.' Digits Digits -> Digit | Digit Digits Digit -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
- Comments: string between /** and **/. Comments can be longer than one line.
The EBNF Grammar
High-level program structures
Program -> MethodDecl MethodDecl* Type -> INT | REAL |STRING MethodDecl -> Type [MAIN] Id '(' FormalParams ')' Block FormalParams -> [FormalParam ( ',' FormalParam )* ] FormalParam -> Type Id
Statements
Block -> BEGIN Statement+ END Statement -> Block | LocalVarDecl | AssignStmt | ReturnStmt | IfStmt | WriteStmt | ReadStmt LocalVarDecl -> Type Id ';' | Type AssignStmt AssignStmt -> Id := Expression ';' | Id := QString ';' ReturnStmt -> RETURN Expression ';' IfStmt -> IF '(' BoolExpression ')' Statement | IF '(' BoolExpression ')' Statement ELSE Statement WriteStmt -> WRITE '(' Expression ',' QString ')' ';' ReadStmt -> READ '(' Id ',' QString ')' ';' QString is any sequence of characters except double quote itself, enclosed in double quotes.
Expressions
Expression -> MultiplicativeExpr (( '+' | '-' ) MultiplicativeExpr)* MultiplicativeExpr -> PrimaryExpr (( '*' | '/' ) PrimaryExpr)* PrimaryExpr -> Num // Integer or Real numbers | Id | '(' Expression ')' | Id '(' ActualParams ')' BoolExpression -> Expression '==' Expression |Expression '!=' Expression ActualParams -> [Expression ( ',' Expression)*]
Sample program
/** this is a comment line in the sample program **/ INT f2(INT x, INT y ) BEGIN INT z; z := x*x - y*y; RETURN z; END INT MAIN f1() BEGIN INT x; READ(x, "A41.input"); INT y; READ(y, "A42.input"); INT z; z := f2(x,y) + f2(y,x); WRITE (z, "A4.output"); END
You should pick out the identifiers from a text file, and write the output to a text file (named A1.output). Note that the output file should contain a line like "identifiers:5" . Here are the sample input and output files.The input will have multiple lines. Please note that in this sample program the following are not counted as identifiers:
INPUT FILE:
INT f2(INT x, INT y ) BEGIN z := x*x - y*y; RETURN z; END INT MAIN F1() BEGIN INT x; READ(x, "A41.input"); INT y; READ(y, "A42.input"); INT z; z := f2(x,y) + f2(y,x); WRITE (z, "A4.output"); END
OUTPUT FILE:
identifiers:5
- A41, input, output: they are quoted hence they are not treated as identifiers;
- INT, READ etc.: They are keywords used in our Tiny language hence they should not be picked up.
Here are the test cases for the assignment:
case 1
INT f2(INT x, INT y ) BEGIN z := x*x - y*y; RETURN z; END INT MAIN F1() BEGIN INT x; READ(x, "A41.input"); INT y; READ(y, "A42.input"); INT z; z := f2(x,y) + f2(y,x); WRITE (z, "A4.output"); END
, case 2,
BEGIN END INT MAIN AAA, X23Y, F1i A END
case 3,
aaa&bbb!ccc#ddd%eee&ffff
case 4,
INT f2(INT x, INT y ) BEGIN z := x*x - y*y; RETURN z; END INT MAIN f1() BEGIN x2222x INT x1; READ(x, "A41. input"); INT y; READ(y, "A42.input"); INT z; z := f2(x,y) + f2(y,x); WRITE (z, "A4.output"); END
case 5,
"A41.input" AA BB CCCC2DDD aaabbb if b c d IF
case 6.
AA1 BBBB22222 CCC3333DDDDD A- B- C- xx_yy_~!@#$%^&*()_+}{}[]\|\":";'?><,./ zz BEGIN END
(ID counts: 5 4 6 7 8 9).
If you can not pass test case 6, and you use [^a-zA-Z] as delimiter in Scanner, you can solve the problem by specifying the encoding as UTF-8, i.e., use
new Scanner(yourFile, "UTF-8")...
In this assignment you can suppose that there are no comments in the programs.
In the output file you should only write "identifiers:" followed by the number of identifiers. If there are multiple occurrences of an identifier in the input, you should only count it once. Don't write anything else into the output file.
You will write two different programs to do this:
- Program A11.java is not supposed to use regular expressions, not regex package, not the methods involvoing regular expression in String class or other classes. Your program can look at characters one by one, and write a loop to check whether they are quoted strings, identifiers, etc. `
- Program A12.java will use java.util.regex. One useful link to start with is a tutorial for Java regex. https://www.regular-expressions.info/java.html
Your programs should be able to run by typing:
%javac A11.java %java A11 A1.tiny %javac A12.java %java A12 A1.tiny
In this assignment, the output should be in a file called "A1.output". You should not use keyboard input. The input file name will be provided as the argument of the program, while the output file name is hard coded in your programs. i.e., your code regarding input and output can be like the following:
... new BufferedReader(new FileReader(args[0])); ... new BufferedWriter(new FileWriter("A1.output"));
Your program should be tested on luna or bravo.
Please don't write unnecessarily long programs. The sample solutions for A11 and A12 consist of approximately 300 words altogether by PHP function str_word_count(), which are not written deliberately for short length and can be compacted into smaller sizes easily. Hence one mark is given if your wordcount is smaller than 300.
Marking Scheme
If your code is shorter or equal to 72 words according to our website (or 32 according to wc the word count command in unix), you will recieve one bonus mark. I.e., the total mark could be 5+1=6.
yourMark=0; if (A11.java, A12.java are not sent properly) return; for (each of A11, A12) if (it is compiled correctly) yourMark+=0.2; for (each of A11, A12){ if (your java program reads A1.tiny && generates result file A1.output) for (each of the 6 tests cases) if (it is correct) yourMark+=0.3; if youCode.length() < average(length of A11 in the class) yourMark+=0.5; } for (each day of your late submission) yourMark=yourMark*0.8; One bonus mark for the shortest code among the class.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started