SE² - A Simple Expression Evaluator for Very Large Integers

Part 3: Expression Evaluation

This is the third part of the semester-long project. More descriptions of the project can be found in the overview slides and the instructions for Part 1 and Part 2

❗️ NOTE: You have to correctly implement Part 1 first before starting this one. Simply copy your code from Part 1 to BigInt.h.

Supported expressions

In this part, you will write code to support expressions including assignments. We will first convert an expression in infix notation to postfix notation. Next, we will build an expression tree to represent the expression. Then, we evaluate the expression tree and output the result, instead of directly calculating the result from postfix representation as in Part 2 since the structure of expression is much more complicated in Part 3.

Syntax and evaluation rules

Below is an informal description of the syntax and evaluation rules for input expressions. The syntax rules is a subset of the Python grammar.

Numbers and variables

The simplest expressions are numbers and variables.

A number is a sequence of digits, e.g., 456 and 1123.
- The evaluation result of a number is the BigInt made from the number.
A variable is a sequence of English letters, e.g., a and bCD.
- A variable must have one (and only one) BigInt associated with it called value. This can be done via assignment (to be introduced shortly). Reassignments override a variable’s value.
- The evaluation result of a variable is the value of the variable.

Arithmetic expressions

There are two types of arithmetic expressions: additive and subtractive. They are defined recursively as follows:

An additive expression is in the form expr1 + expr2, where expr1 and expr2 could be a number, a variable, or another arithmetic expression.
- Expressions expr1 and expr2 are evaluated in order (see Operator precedence). The evaluation result of the whole additive expression is the sum of the evaluation results of expr1 and expr2.
A subtractive expression is in the form expr1 - expr2, where expr1 and expr2 could be a number, a variable, or another arithmetic expression.
- Expressions expr1 and expr2 are evaluated in order (see Operator precedence). The evaluation result of the whole subtractive expression is the absolute difference of the evaluation results of expr1 and expr2.

Assignments

We also allow creation and assignment for variables.

An assignment expression is in the form var1 = expr2 where var1 is a variable and expr2 is a number, a variable, or an arithmetic expression.
- The evaluation result of an assignment expression is the evaluation result of expr2. The expression associates var1 with the evaluation result.

Compound expressions

Compound expressions are also defined recursively, as follows:

A compound expression is in the form expr1 expr2; where expr1 could be another compound expression or empty, and expr2 could be a number, a variable, an arithmetic expression, or an assignment expression.
- Expressions expr1 and expr2 are evaluated in order. The evaluation result of the compound expression is the evaluation result of expr2.

Operator precedence

The following table lists the precedence and associativity of supported operators. Operators are listed top to bottom, in descending precedence.

Precedence	Operator	Associativity
1	`a+b`, `a-b`	Left-to-right
2	`a=b`	Right-to-left

Where it is desired to override the precedence and associativity conventions, or even simply to emphasize them, parentheses () can be used to indicate an alternative order of operations (or to simply reinforce the default order of operations). For example, 2 + (3 - 4) forces subtraction to precede addition.

Also note that the ; in a compound expression is NOT an operator.

Example

a = 456 + (1123 - 1); b = a + 1; b; is a compound expression.

The result of the first assignment expression a = 456 + (1123 - 1) is the result of 456 + (1123 - 1) (which is 1578). It also associates variable a with this result, i.e., 1578 is the value of variable a.
The second assignment expression b = a + 1 is evaluated to a + 1 which is 1579 (using Arithmetic expressions) and associates it with variable b.
The last expression b has value 1579. Therefore, the compound expression as a whole has evaluation result 1579.

🦄 Challenge: Add support to multiplicative expressions.

Tokenization

Tokens

As in Part 2, a tokenizer is given in Tokenizer.h. Since the input expression now supports variable definition and assignments, we consider more kinds of tokens. In addition to Number, LeftParen, RightParen, Plus, Minus, Asterisk (for multiplication), End, and Unexpected, the tokenizer also recognizes Variable, Equal, and Semicolon. You can find the detailed definition of the Token class in Token.h.

Example

Given input a = 456 + (1123 - 1); b = a + 1; b;, the tokenizer generates a token list:

Variable("a")  Equal("=")  Number("456")  Plus("+")  LeftParen("(")  Number("1123")  Minus("-")  Number("1")  RightParen(")")  Semicolon(";")  Variable("b")  Equal("=")  Variable("a")  Plus("+")  Number("1")  Semicolon(";")  Variable("b")  Semicolon(";")

where every term between the parentheses is the literal string for each token.

As in Part 2, you can access the underlying literal string of a token using lexeme. For instance, let t be the token Variable("a"), t.lexeme gives the name of variable "a" as string. For token Number("1123"), you can create a BigInt of 1123 from the number by

BigInt number(t.lexeme);

To check the kind of a token t, you can use the is and is_one_of helper functions defined in Token.h. Below are some examples:

t.is(Token::Kind::Variable);
// returns true if `t` is a `Variable` token and false otherwise

t.is_one_of(Token::Kind::Plus, Token::Kind::Minus, Token::Kind::Number);
// returns true if `t` is a `Plus`, `Minus`, or `Number` token

Infix to postfix

Postfix notation

As in Part 2, you should complete the InfixToPostfixTransformer::infixToPostfix function in Transformer.cpp that transforms an infix token list from Tokenization to its corresponding postfix token stack using the reverse Polish notation. Recall that the syntax of postfix expression is operand1 operand2 operator where the two operands are specified first, followed by the operator. When creating the token stack, you should follow the rules in Part 2, that is, pushing operand1 first, then operand2, and finally operator to the stack.

❗️ NOTE: ; is not an operator. When encountering a semicolon, for example something;, push something first and then ; to the stack.

Example

For the example input a = 456 + (1123 - 1); b = a + 1; b;, the output postfix token stack is:

       top
+----------------+
| Semicolon(";") |
+----------------+
|  Variable("b") |
+----------------+
| Semicolon(";") |
+----------------+
|   Equal("=")   |
+----------------+
|    Plus("+")   |
+----------------+
|   Number("1")  |
+----------------+
|  Variable("a") |
+----------------+
|  Variable("b") |
+----------------+
| Semicolon(";") |
+----------------+
|   Equal("=")   |
+----------------+
|    Plus("+")   |
+----------------+
|   Minus("-")   |
+----------------+
|  Number("1")   |
+----------------+
| Number("1123") |
+----------------+
|  Number("456") |
+----------------+
|  Variable("a") |
+----------------+

where Semicolon(";") is the top of the stack and Variable("a") is the bottom.

🍦 Tips: If you’ve spent weeks and still couldn’t figure out how to do the transformation, you can

stack<Token> infixToPostfix(VList<Token> infix) {
    stack<Token> postfixStack; // the resulting postfix token stack
    stack<Token> opStack; // record operations
    // loop through the infix token list
    for i = 0 to infix.size() - 1 {
        let tk be the i-th token in list;
        if tk is a Number or a Variable {
            push tk to postfixStack;
        }  else if tk is an operator (i.e., Plus, Minus, Equal) {
            // pop operators from opStack and push them to postfixStack
            // until 1) opStack is empty or 2) an open parenthesis is encountered
            // or 3) an operator w/ lower precedence is encountered
            while (!opStack.empty()
                   and opStack.top() is not LeftParen
                   and opStack.top() has higher or same precedence) {
                push opStack.top() to postfixStack;
                opStack.pop();
            }
            push tk to the opStack;
        } else if tk is LeftParen {
            push tk to opStack;
        } else if tk is RightParen {
            // when we reach a closing parenthesis, start popping
            // off operators until we run into an opening parenthesis
            while (!opStack.empty()) {
                if (opStack.top() is LeftParen) {
                    opStack.pop(); // pop the LeftParen out
                    break;
                } else {
                    push opStack.top() to postfixStack;
                    opStack.pop();
                }
            }
        } else if tk is a Semicolon {
            // push whatever left in opStack to postfixStack
            while (!opStack.empty()) {
                push opStack.top() to postfixStack;
                opStack.pop();
            }
            push tk to the postfixStack; // push Semicolon
        }
    }
    if there are any remaining tokens left in opStack {
        pop them off and push them to postfixStack one by one;
    }
    return postfixStack;
}

Expression trees

Each expression described in Supported expressions can be represented using an expression tree. The definition of ExprTreeNode can be found in ExprTreeNode.h. Each node contains a token, an expr1 pointer (pointing to the node for the first sub-expression), and an expr2 pointer (pointing to the node for the second sub-expression).

You need to implement the buildExprTree function in Evaluator.cpp that builds an expression tree from given postfix token stack. The expression tree for the example input a = 456 + (1123 - 1); b = a + 1; b; can be found in main.cpp, where the node that follows ├─ is expr1 and the node that follows └─ is expr2. To create an ExprTreeNode given a token t, you can do

ExprTreeNode *n = new ExprTreeNode(t); // `expr1` and `expr2` will be `nullptr` by default

Then you can access the token, expr1 and expr2 pointers by n->token, n->expr1 and n->expr2.

Examples

Numbers

The expression tree for a number contains only one ExprTreeNode where expr1 and expr2 pointers are both nullptr. For example, expression 456 has a tree as follows, where the token is displayed inside the node:

+---------------+
| Number("456") |
+---------------+

Variables

The expression tree for a variable has only one ExprTreeNode. For example, expression a has a tree as follows:

+---------------+
| Variable("a") |
+---------------+

Additive and subtractive expressions

The expression tree for an arithmetic expression (in the form expr1 operator expr2) contains at least three ExprTreeNodes, with the root being the node for the operator and its two child nodes being the root of subtrees for the two operands expr1 and expr2. For example, expression 456 - 1123 (456 1123 - in postfix notation) has a tree as follows:

          +-------------+
          |  Plus("+")  |
          +-------------+
             /        \
          expr1       expr2
           /            \
+---------------+    +----------------+
| Number("456") |    | Number("1123") |
+---------------+    +----------------+

As another example, a nested arithmetic expression a + (456 - 1123) (a 456 1123 - + in postfix notation) has a tree as follows:

          +-------------+
          |  Plus("+")  |
          +-------------+
            /         \
         expr1        expr2
          /             \
+---------------+    +--------------+
| Variable("a") |    |  Minus("-")  |
+---------------+    +--------------+
                       /         \
                     expr1       expr2
                     /             \
          +---------------+    +----------------+
          | Number("456") |    | Number("1123") |
          +---------------+    +----------------+

Assignment expressions

The expression tree for an assignment expression (in the form var1 = expr2) contains at least three ExprTreeNodes, with the root being the node for the operator = and its two child nodes being the root of subtrees for the two operands var1 and expr2.

For example, expression a = 5 (a 5 = in postfix notation) has a tree like the following:

          +--------------+
          |  Equal("=")  |
          +--------------+
              /       \
            expr1     expr2
            /           \
+---------------+    +-------------+
| Variable("a") |    | Number("5") |
+---------------+    +-------------+

As another example, the tree for expression a = b + 456 (a b 456 + = in postfix notation) is as follows:

          +--------------+
          |  Equal("=")  |
          +--------------+
              /       \
            expr1     expr2
            /           \
+---------------+    +--------------+
| Variable("a") |    |   Plus("+")  |
+---------------+    +--------------+
                        /       \
                      expr1     expr2
                      /           \
          +---------------+    +---------------+
          | Variable("b") |    | Number("456") |
          +---------------+    +---------------+

Compound expressions

The expression tree for a compound expression (in the form expr1 expr2;) may contain the followingExprTreeNodes: the root being the node for ;, one child node being the root of subtree for expr1 if expr1 is not empty, and the other child node being the root of subtree for expr2. When expr1 is empty, there is only one child node from the root for expr2.

For example, the tree for expression a = 456; a + 1; (a 456 = ; a 1 + ; in postfix notation) is as follows:

                   +----------------+
                   | Semicolon(";") |
                   +----------------+
                    /              \
                  expr1            expr2
                  /                  \
      +----------------+            +-----------+
      | Semicolon(";") |            | Plus("+") |
      +----------------+            +-----------+
       /         |                     |        \
     expr1     expr2                 expr1      expr2
     /           |                     |          \
nullptr  +----------------+   +---------------+  +-------------+
         |   Equal("=")   |   | Variable("a") |  | Number("1") |
         +----------------+   +---------------+  +-------------+
           /          \
         expr1        expr2
         /              \
+---------------+    +---------------+
| Variable("a") |    | Number("456") |
+---------------+    +---------------+   

The nullptr means that the expr1 of a compound expression is empty.

A few notes

The construction of the expression trees should follow the syntax rules in Supported expressions. For each Variable and Number token, there should be only one ExprTreeNode being created. For each Plus and Minus token, you should create one node for the token, and at least two more nodes for expr1 and expr2. For an Equal token, you should create one node for the token, one node for the var1, and at least one node for expr2. For a Semicolon token, you should create one node for the token, at least one node for expr2, and at least one node for expr1 if it is not empty.

Evaluation

You need to implement the evaluateExprTree function in Evaluator.cpp that evaluates the expression tree rooted at root by traversing through the tree, and returns the resulting BigInt. The evaluation rules are described in Supported expressions.

You can assume that expressions used for testing always follow the syntax rules. You don’t need to consider invalid expressions.
Function evaluateExprTree has a parameter map<string, BigInt> &varTbl that stores the mapping from a literal variable to its BigInt value. You need to update or access varTbl when you see assignments or uses of variables during the traversal of the tree. To update varTbl with a given key k and value v, do varTbl[k] = v. To get the value of a variable k, do varTbl[k].
You can access the token of a node n by n->token and you can access the literal string of the token by n->token.lexeme. More use cases can be found in Tokenization - Example above.

In summary, the whole process of evaluation of a given expression combines the following 4 steps:

Step	Input (i.e., the parameter)	Output (i.e., the return value)
Tokenization of input infix expression (given in `Tokenizer.h`)	String `"a = 456 + (1123 - 1); b = a + 1; b;"`	Infix token list in Tokenization - Example
Infix token list to postfix token stack transformation	Output from above step	Postfix token stack in Infix to postfix - Example
Create expression tree from postfix token stack	Output from above step	Tree in `main.cpp`
Evaluate the expression tree and return the result	Output from above step	`1579` as `BigInt`

Test your implementation

A sample doctest test case is given in main.cpp. To run the given test, simply execute in the terminal: make clean test. We encourage your to write your own tests as the sample one might be “too simple”. But they will not be used for grading.

Submitting the code

After testing the program locally, you should submit Evaluator.cpp and Transformer.cpp, as well as BigInt.h, to the autograding system. There will be 10 test cases (30 points in total) to evaluate your implementation. For each passing test case, you will get 3 points if there is no memory error and 2 points if there are memory errors. We will also manually look at your code to evaluate the quality of your code (50 points), including coding style, comments, indentation, and so on.

✨ Extra Credit: There are 5 test cases (expressions) that contain the multiplication operator, each of which is 1 point.

SE2 - A Simple Expression Evaluator for Very Large Integers

Part 3: Expression Evaluation

Supported expressions

Syntax and evaluation rules

Numbers and variables

Arithmetic expressions

Assignments

Compound expressions

Operator precedence

Example

Tokenization

Tokens

Example

Infix to postfix

Postfix notation

Example

Expression trees

Examples

Numbers

Variables

Additive and subtractive expressions

Assignment expressions

Compound expressions

A few notes

Evaluation

Test your implementation

Submitting the code

SE² - A Simple Expression Evaluator for Very Large Integers