SE2 - A Simple Expression Evaluator for Very Large Integers

Part 3: Expression Evaluation

This is the third part of the semester-long project. More descriptions of the project can be found in the overview slides and the instructions for Part 1 and Part 2

❗️ NOTE: You have to correctly implement Part 1 first before starting this one. Simply copy your code from Part 1 to BigInt.h.

Supported expressions

In this part, you will write code to support expressions including assignments. We will first convert an expression in infix notation to postfix notation. Next, we will build an expression tree to represent the expression. Then, we evaluate the expression tree and output the result, instead of directly calculating the result from postfix representation as in Part 2 since the structure of expression is much more complicated in Part 3.

Syntax and evaluation rules

Below is an informal description of the syntax and evaluation rules for input expressions. The syntax rules is a subset of the Python grammar.

Numbers and variables

The simplest expressions are numbers and variables.

Arithmetic expressions

There are two types of arithmetic expressions: additive and subtractive. They are defined recursively as follows:

Assignments

We also allow creation and assignment for variables.

Compound expressions

Compound expressions are also defined recursively, as follows:

Operator precedence

The following table lists the precedence and associativity of supported operators. Operators are listed top to bottom, in descending precedence.

Precedence Operator Associativity
1 a+b, a-b Left-to-right
2 a=b Right-to-left

Where it is desired to override the precedence and associativity conventions, or even simply to emphasize them, parentheses () can be used to indicate an alternative order of operations (or to simply reinforce the default order of operations). For example, 2 + (3 - 4) forces subtraction to precede addition.

Also note that the ; in a compound expression is NOT an operator.

Example

a = 456 + (1123 - 1); b = a + 1; b; is a compound expression.

🦄 Challenge: Add support to multiplicative expressions.

Tokenization

Tokens

As in Part 2, a tokenizer is given in Tokenizer.h. Since the input expression now supports variable definition and assignments, we consider more kinds of tokens. In addition to Number, LeftParen, RightParen, Plus, Minus, Asterisk (for multiplication), End, and Unexpected, the tokenizer also recognizes Variable, Equal, and Semicolon. You can find the detailed definition of the Token class in Token.h.

Example

Given input a = 456 + (1123 - 1); b = a + 1; b;, the tokenizer generates a token list:

Variable("a")  Equal("=")  Number("456")  Plus("+")  LeftParen("(")  Number("1123")  Minus("-")  Number("1")  RightParen(")")  Semicolon(";")  Variable("b")  Equal("=")  Variable("a")  Plus("+")  Number("1")  Semicolon(";")  Variable("b")  Semicolon(";")

where every term between the parentheses is the literal string for each token.

As in Part 2, you can access the underlying literal string of a token using lexeme. For instance, let t be the token Variable("a"), t.lexeme gives the name of variable "a" as string. For token Number("1123"), you can create a BigInt of 1123 from the number by

BigInt number(t.lexeme);

To check the kind of a token t, you can use the is and is_one_of helper functions defined in Token.h. Below are some examples:

t.is(Token::Kind::Variable);
// returns true if `t` is a `Variable` token and false otherwise

t.is_one_of(Token::Kind::Plus, Token::Kind::Minus, Token::Kind::Number);
// returns true if `t` is a `Plus`, `Minus`, or `Number` token

Infix to postfix

Postfix notation

As in Part 2, you should complete the InfixToPostfixTransformer::infixToPostfix function in Transformer.cpp that transforms an infix token list from Tokenization to its corresponding postfix token stack using the reverse Polish notation. Recall that the syntax of postfix expression is operand1 operand2 operator where the two operands are specified first, followed by the operator. When creating the token stack, you should follow the rules in Part 2, that is, pushing operand1 first, then operand2, and finally operator to the stack.

❗️ NOTE: ; is not an operator. When encountering a semicolon, for example something;, push something first and then ; to the stack.

Example

For the example input a = 456 + (1123 - 1); b = a + 1; b;, the output postfix token stack is:

       top
+----------------+
| Semicolon(";") |
+----------------+
|  Variable("b") |
+----------------+
| Semicolon(";") |
+----------------+
|   Equal("=")   |
+----------------+
|    Plus("+")   |
+----------------+
|   Number("1")  |
+----------------+
|  Variable("a") |
+----------------+
|  Variable("b") |
+----------------+
| Semicolon(";") |
+----------------+
|   Equal("=")   |
+----------------+
|    Plus("+")   |
+----------------+
|   Minus("-")   |
+----------------+
|  Number("1")   |
+----------------+
| Number("1123") |
+----------------+
|  Number("456") |
+----------------+
|  Variable("a") |
+----------------+

where Semicolon(";") is the top of the stack and Variable("a") is the bottom.

🍦 Tips: If you’ve spent weeks and still couldn’t figure out how to do the transformation, you can

Expression trees

Each expression described in Supported expressions can be represented using an expression tree. The definition of ExprTreeNode can be found in ExprTreeNode.h. Each node contains a token, an expr1 pointer (pointing to the node for the first sub-expression), and an expr2 pointer (pointing to the node for the second sub-expression).

You need to implement the buildExprTree function in Evaluator.cpp that builds an expression tree from given postfix token stack. The expression tree for the example input a = 456 + (1123 - 1); b = a + 1; b; can be found in main.cpp, where the node that follows ├─ is expr1 and the node that follows └─ is expr2. To create an ExprTreeNode given a token t, you can do

ExprTreeNode *n = new ExprTreeNode(t); // `expr1` and `expr2` will be `nullptr` by default

Then you can access the token, expr1 and expr2 pointers by n->token, n->expr1 and n->expr2.

Examples

Numbers

The expression tree for a number contains only one ExprTreeNode where expr1 and expr2 pointers are both nullptr. For example, expression 456 has a tree as follows, where the token is displayed inside the node:

+---------------+
| Number("456") |
+---------------+

Variables

The expression tree for a variable has only one ExprTreeNode. For example, expression a has a tree as follows:

+---------------+
| Variable("a") |
+---------------+

Additive and subtractive expressions

The expression tree for an arithmetic expression (in the form expr1 operator expr2) contains at least three ExprTreeNodes, with the root being the node for the operator and its two child nodes being the root of subtrees for the two operands expr1 and expr2. For example, expression 456 - 1123 (456 1123 - in postfix notation) has a tree as follows:

          +-------------+
          |  Plus("+")  |
          +-------------+
             /        \
          expr1       expr2
           /            \
+---------------+    +----------------+
| Number("456") |    | Number("1123") |
+---------------+    +----------------+

As another example, a nested arithmetic expression a + (456 - 1123) (a 456 1123 - + in postfix notation) has a tree as follows:

          +-------------+
          |  Plus("+")  |
          +-------------+
            /         \
         expr1        expr2
          /             \
+---------------+    +--------------+
| Variable("a") |    |  Minus("-")  |
+---------------+    +--------------+
                       /         \
                     expr1       expr2
                     /             \
          +---------------+    +----------------+
          | Number("456") |    | Number("1123") |
          +---------------+    +----------------+

Assignment expressions

The expression tree for an assignment expression (in the form var1 = expr2) contains at least three ExprTreeNodes, with the root being the node for the operator = and its two child nodes being the root of subtrees for the two operands var1 and expr2.

For example, expression a = 5 (a 5 = in postfix notation) has a tree like the following:

          +--------------+
          |  Equal("=")  |
          +--------------+
              /       \
            expr1     expr2
            /           \
+---------------+    +-------------+
| Variable("a") |    | Number("5") |
+---------------+    +-------------+

As another example, the tree for expression a = b + 456 (a b 456 + = in postfix notation) is as follows:

          +--------------+
          |  Equal("=")  |
          +--------------+
              /       \
            expr1     expr2
            /           \
+---------------+    +--------------+
| Variable("a") |    |   Plus("+")  |
+---------------+    +--------------+
                        /       \
                      expr1     expr2
                      /           \
          +---------------+    +---------------+
          | Variable("b") |    | Number("456") |
          +---------------+    +---------------+

Compound expressions

The expression tree for a compound expression (in the form expr1 expr2;) may contain the followingExprTreeNodes: the root being the node for ;, one child node being the root of subtree for expr1 if expr1 is not empty, and the other child node being the root of subtree for expr2. When expr1 is empty, there is only one child node from the root for expr2.

For example, the tree for expression a = 456; a + 1; (a 456 = ; a 1 + ; in postfix notation) is as follows:

                   +----------------+
                   | Semicolon(";") |
                   +----------------+
                    /              \
                  expr1            expr2
                  /                  \
      +----------------+            +-----------+
      | Semicolon(";") |            | Plus("+") |
      +----------------+            +-----------+
       /         |                     |        \
     expr1     expr2                 expr1      expr2
     /           |                     |          \
nullptr  +----------------+   +---------------+  +-------------+
         |   Equal("=")   |   | Variable("a") |  | Number("1") |
         +----------------+   +---------------+  +-------------+
           /          \
         expr1        expr2
         /              \
+---------------+    +---------------+
| Variable("a") |    | Number("456") |
+---------------+    +---------------+   

The nullptr means that the expr1 of a compound expression is empty.

A few notes

The construction of the expression trees should follow the syntax rules in Supported expressions. For each Variable and Number token, there should be only one ExprTreeNode being created. For each Plus and Minus token, you should create one node for the token, and at least two more nodes for expr1 and expr2. For an Equal token, you should create one node for the token, one node for the var1, and at least one node for expr2. For a Semicolon token, you should create one node for the token, at least one node for expr2, and at least one node for expr1 if it is not empty.

Evaluation

You need to implement the evaluateExprTree function in Evaluator.cpp that evaluates the expression tree rooted at root by traversing through the tree, and returns the resulting BigInt. The evaluation rules are described in Supported expressions.

In summary, the whole process of evaluation of a given expression combines the following 4 steps:

Step Input (i.e., the parameter) Output (i.e., the return value)
Tokenization of input infix expression (given in Tokenizer.h) String "a = 456 + (1123 - 1); b = a + 1; b;" Infix token list in Tokenization - Example
Infix token list to postfix token stack transformation Output from above step Postfix token stack in Infix to postfix - Example
Create expression tree from postfix token stack Output from above step Tree in main.cpp
Evaluate the expression tree and return the result Output from above step 1579 as BigInt

Test your implementation

A sample doctest test case is given in main.cpp. To run the given test, simply execute in the terminal: make clean test. We encourage your to write your own tests as the sample one might be “too simple”. But they will not be used for grading.

Submitting the code

After testing the program locally, you should submit Evaluator.cpp and Transformer.cpp, as well as BigInt.h, to the autograding system. There will be 10 test cases (30 points in total) to evaluate your implementation. For each passing test case, you will get 3 points if there is no memory error and 2 points if there are memory errors. We will also manually look at your code to evaluate the quality of your code (50 points), including coding style, comments, indentation, and so on.

✨ Extra Credit: There are 5 test cases (expressions) that contain the multiplication operator, each of which is 1 point.