You will build a recursive-descent parser for the right recursive version of the classic expression grammar found in the text. Take the approach, because this can free you from the need to write any stack management functions. Plus, most of the work has been done for you.
UNIX doesn't provide any tools for top-down parsers because bottom-up parsers, although more complex, are of more general utility. But since top-down parsers are simpler in structure, writing your own parser in C or C++ is not too difficult. (There is a free Java-based top-down parser generator called ANTLR available.)
You need to have a scanner as the front end to your parser. You should use flex to identify your tokens. One call to flex will return one token. We've already covered a lot of what we need in class. We figured out how to make the grammar right-recursive and we worked out the first sets and when we need them. You should use the most complete form of the grammar, which uses all four operators and parentheses. We won't be using variables, though, just integers.
Remember that a recursive descent parser has a function to match every non-terminal symbol. To match the terminal symbols you could just have some kind of match function. See the Stanford notes below. You still need a scanner and you should use flex for that.
Recursive Descent Parser
The input is a string consisting of identifiers, operators and parentheses. The output is an abstract syntax tree. The input string may be of arbitrary complexity.
It is permissible to truncate the length of identifiers to make your tree look nicer.
The tree should be totally constructed first and then printed, not printed while the parse is still going on.
Here is the full set of operations:
1. Write a flex file that identifies integers, parentheses and +-*/.
2. Add to the flex file code to implement your recursive-descent parser that will take as its input an arithmetic expression.
3. Compile your flex code to create a compiler for the language of arithmetic expressions.
4. Run your compiler with an arithmetic expression as input. The compiler builds the parse tree and then prints it.
Implementation Specification
A few points to consider
• I believe the best approach would probably be to have each non-terminal function return a pointer to the subtree it has built.
• The C equivalent to the C++ new operator is malloc.
• You are free to create a node structure as you wish. However, there is no need to make it very complicated. For example, an operator node has an operator (one character) and two subtree pointers. A leaf node has just a string containing the identifier name.
• A pointer-based data structure is the obvious way to handle the arbitrary complexity of the tree.
• Write your own hand-coded parser following the structure of the pseudocode given in the text. Do not use yacc.
• Please try to do nice error-handling, pointing out where in the input string the error occurs and what kind of error it is. However, you should halt your parse on the first error in a line; don't try to continue. Start a new parse on the next line.
• Note that this kind of stack-based evaluation is essentially the same as postfix notation. You might find some useful information by searching on this topic.
• Possible enhancements are adding the exponentiation operator ^ and adding unary + and - operators. Both of these will require additional levels of operator precedence.