You have to design a syntactic analyzer for the language specified by the grammar below. We are using the following convention: Terminals (lexical elements) are represented with the bold courier font like this. Non-terminals are represented in angle brackets . The character ε (epsilon) represents an empty stream. The non-terminal is the starting symbol of the grammar.
Grammar
::= *
::= class id {**};
::= program;*
::= id()
::= ;
::= {**}
::= id*;
::= ;
| if()thenelse;
| while()do;
| read();
| write();
| return();
::= {*} | | ε
::= |
::= |
::= + | -
::= |
::=
| *id()
| num
| ()
| not
|
::= *id*
::= id*.
::= []
::= [ int ]
::= integer | real | id
::= id** | ε
::= * | ε
::= ,id*
::= ,
Operators and additional lexical conventions
::= =
::= == | <> | < | > | <= | >=
::= + | - | or
::= * | / | and
id ::= follows specification for identifiers found in assignment#1
num ::= follows specification for numbers found in assignment#1
int ::= *
::= 1..9
::= | 0
For example, the non-terminal is a generalization of the addition operators tokens +, - and or. The use of this notation here does not necessarily imply that you have to define a new type of token in your lexical analyzer. Also, id and num are tokens that refer to the lexical conventions given in the first assignment. Note that a new lexical convention for the token int has been added.
Work to be done
• Analyze the syntactical definition given on the first page (and the additional lexical definition for the token int). Remove all the * notations and replace them by list-generating productions. List in your documentation all the ambiguities and left recursions, or any error you may find in the grammar. Modify the productions so that the left recursions and ambiguities are removed without modifying the language. You should obtain a set of productions that can be parsed using the top-down predictive parsing method. Include the transformed grammar in your documentation.
• Derive the FIRST and FOLLOW sets for each non-terminal in your transformed grammar and list them in your documentation.
• Implement a predictive parser (recursive descent or table-driven) for your modified set of grammar rules.
• Your parser should optionally output to a file the derivation that derives the source program from the starting symbol.
• The parser should call your lexical analyzer as developed in assignment 1 when it needs a new token.
• The parser should properly identify the errors in the input program and print a meaningful message to the user for each error encountered. The parser should implement an error recover method that permits to report all errors. The error messages should be informative on the nature of the errors, as well as the location of the errors in the input file.
• In this assignment, you only check the syntactic correctness of the program, i.e., check whether the source program can be parsed according to the grammar. Do not check the semantic correctness of the program in this assignment.
• Write a set of source files that enable to test the parser for all syntactical structures involved in the language. Include cases testing for a variety of different errors to demonstrate the accuracy of your error reporting and recovery.