SE² - A Simple Expression Evaluator for Very Large Integers

Part 2: Tokenization and Postfix Notation

This is the second part of the semester-long project. More descriptions of the project can be found in the overview slides and the instructions for Part 1.

1. Background

We have implemented the BigInt class in Part 1 that could hold arbitrarily large integers and support operators like addition and subtraction for absolute difference of two BigInt objects, in the form of operand1 operator operand2. This notation is called infix notation.

In this part, we will write code to support longer more complex expressions that may contain multiple operators and parentheses. We will first convert an expression in infix notation to postfix notation which is easier for evaluation, and then write code to do the actual calculation of a given expression and output the result.

❗️ NOTE: You have to correctly implement Part 1 first before starting this one. Simply copy the code from Part 1 to BigInt.h.

2. Tokenization

2.1. Tokens

The input infix expression may contain (, ), +, -, digits, and white spaces. For example, 23543 + ( 345 - 123). Instead of reading the input by yourself, a tokenizer is given in Tokenizer.h. The tokenizer scans through the input and generates a sequence of tokens from the input. You can find the definition of the Token class in Token.h. We consider various kinds of tokens: Number, LeftParen, RightParen, Plus, Minus, Asterisk (for multiplication), End, and Unexpected. The last two tokens indicate the end of code and error during tokenization. The tokenizer uses them to check if it reaches the end of the input and does not add them to the resulting token list.

2.2. Example

Given input 456 + (1123 - 1) + 1, the tokenizer will generate a VList of tokens:

Number("456")  Plus("+")  LeftParen("(")  Number("1123")  Minus("-")  Number("1")  RightParen(")")  Plus("+")  Number("1")

where every term between the parentheses is the literal string for each token.

Given a token, you can access its underlying literal string using lexeme. For instance, let t be the token Number("1123"). To get the number, we can do t.lexeme which would return a string of "1123". To create a BigInt out of the string, we can do:

BigInt number(t.lexeme);

To check the kind of a token t, you can use the is and is_one_of helper functions defined in Token.h. Below are some examples:

t.is(Token::Kind::Number);
// returns true if `t` is a `Number` token and false otherwise

t.is_one_of(Token::Kind::Plus, Token::Kind::Minus, Token::Kind::Number);
// returns true if `t` is a `Plus`, `Minus`, or `Number` token

3. Infix to postfix

3.1. Postfix notation

Complete the InfixToPostfixTransformer::infixToPostfix function in Transformer.cpp. Your implementation should transform an infix token list to its corresponding postfix token stack using the reverse Polish notation.

The syntax of postfix expression requires that the two operands are specified first, followed by the operator: operand1 operand2 operator. One advantage of using postfix notation is getting rid of parentheses to make the evaluation easier.

🍿 Example: 23543 345 123 - + means 23543 + (345 - 123). To decompose, for the operator +, we have the 1st operand operand1 being 23543, and the 2nd operand operand2 being 345 123 - which itself is the postfix representation of 345 - 123. With this, a postfix expression is recursive, i.e., there might be nested expressions.

In the function, you should wirte code to transform the infix token list provided in Tokenization to its corresponding postfix token stack. In particular, the resulting postfix token stack should be of type stack (i.e., the stack data structure from C++ STL), following the rule:

For operand1 operand2 operator, first push operand1, then operand2, and finally operator to the stack.

3.2. Example

For the example of 23543 + (345 - 123), the tokenizer will generate the infix token list

Number("23543") Plus("+") LeftParen("(") Number("345") Minus("-") Number("123") RightParen(")")

which is the input of InfixToPostfixTransformer::infixToPostfix. The output of the function should be a postfix token stack like the following:

       top
+-----------------+
|    Plus("+")    |
+-----------------+
|    Minus("-")   |
+-----------------+
|  Number("123")  |
+-----------------+
|  Number("345")  |
+-----------------+
| Number("23543") |
+-----------------+

🍦 Tips: If you’ve spent weeks and still couldn’t figure out how to do the transformation, you can

stack<Token> infixToPostfix(VList<Token> infix) {
    stack<Token> postfixStack; // the resulting postfix token stack
    stack<Token> opStack; // used to handle parentheses; storing operators and parens

    // loop through the infix token list
    for i from 0 to infix.size() - 1 {
        let tk be the i-th token in list;
        if tk is a Number {
            push tk to postfixStack;
        } else if tk is an operator (i.e., Plus, Minus) {
            // pop operators from opStack and push them to postfixStack
            // until 1) opStack is empty or 2) an open parenthesis is encountered
            while (!opStack.empty() and opStack.top() is not LeftParen) {
                push opStack.top() to postfixStack;
                opStack.pop();
            }
            push tk to the opStack;
        } else if tk is LeftParen {
            push tk to opStack;
        } else if tk is RightParen {
            // when we reach a closing parenthesis, start popping
            // off operators until we run into an opening parenthesis
            while (!opStack.empty()) {
                if (opStack.top() is LeftParen) {
                    opStack.pop(); // pop the LeftParen out
                    break;
                } else {
                    push opStack.top() to postfixStack;
                    opStack.pop();
                }
            }
        }
    }
    if there are any remaining tokens left in opStack {
        pop them off and push them to postfixStack one by one;
    }

    return postfixStack;
}

4. Calculate the result of expressions

Complete the Calculator::calculate function in Calculator.cpp. The input is a stack of tokens in postfix form from the above step. Your code should calculate the result of the original expression (for example, 23765 for 23543 + (345 - 123)) using the postfix token stack. Ultimately, the combination of the two functions, InfixToPostfixTransformer::infixToPostfix and Calculator::calculate, evaluates and returns the result of an infix expression.

🍨 Tips: In postfix notation, the two operands are always right below the operator in stack (see the rule in Infix to postfix - Postfix notation). So if the current top token in stack is an operator, we can 1) record and pop out the operator, 2) recursively calculate the second and first operands as BigInts using the stack, and 3) apply the operator using the two operands and return the resulting BigInt. On the other hand, if the top token is a Number, we just need to create a BigInt from it.

🦄 Challenge: Can you add support to expressions containing the multiplication operator (i.e., the Asterisk token)?

4.1. Example

In summary, the calculation combines the following 3 steps:

Step	Input (i.e., the parameter)	Output (i.e., the return value)
Tokenization of input infix expression (given in `Tokenizer.h`)	`"23543 + (345 - 123)"`	Infix token list in Tokenization - Example
Infix token list to postfix token stack transformation	Output from above step	Postfix token stack Infix to postfix - Example
Evaluation using postfix token stack	Output from above step	Evaluation result `23765` as a `BigInt`

5. Test your implementation

A sample doctest test case is given in main.cpp. To run the given test, simply execute in the terminal: make clean test. We encourage your to write your own tests as the sample one might be “too simple”. But they will not be used for grading.

6. Submitting the code

After testing the program locally, you should submit Calculator.cpp and Transformer.cpp, as well as BigInt.h, to the autograding system. There will be 10 test cases (30 points in total) to evaluate your implementation. For each passing test case, you will get 3 points if there is no memory error and 2 points if there are memory errors. We will also manually look at your code to evaluate the quality of your code (50 points), including coding style, comments, indentation, and so on.

✨ Extra Credit: There are 5 test cases (expressions) that contain the multiplication operator, each of which is 1 point.

SE2 - A Simple Expression Evaluator for Very Large Integers