This is the third part of the semester-long project. More descriptions of the project can be found in the overview slides and the instructions for Part 1 and Part 2
❗️ NOTE: You have to correctly implement Part 1 first before starting this one. Simply copy your code from Part 1 to BigInt.h.
In this part, you will write code to support expressions including assignments. We will first convert an expression in infix notation to postfix notation. Next, we will build an expression tree to represent the expression. Then, we evaluate the expression tree and output the result, instead of directly calculating the result from postfix representation as in Part 2 since the structure of expression is much more complicated in Part 3.
Below is an informal description of the syntax and evaluation rules for input expressions. The syntax rules is a subset of the Python grammar.
The simplest expressions are numbers and variables.
456 and 1123.
BigInt made from the number.a and bCD.
BigInt associated with it called value. This can be done via assignment (to be introduced shortly). Reassignments override a variable’s value.There are two types of arithmetic expressions: additive and subtractive. They are defined recursively as follows:
expr1 + expr2, where expr1 and expr2 could be a number, a variable, or another arithmetic expression.
expr1 and expr2 are evaluated in order (see Operator precedence). The evaluation result of the whole additive expression is the sum of the evaluation results of expr1 and expr2.expr1 - expr2, where expr1 and expr2 could be a number, a variable, or another arithmetic expression.
expr1 and expr2 are evaluated in order (see Operator precedence). The evaluation result of the whole subtractive expression is the absolute difference of the evaluation results of expr1 and expr2.We also allow creation and assignment for variables.
var1 = expr2 where var1 is a variable and expr2 is a number, a variable, or an arithmetic expression.
expr2. The expression associates var1 with the evaluation result.Compound expressions are also defined recursively, as follows:
expr1 expr2; where expr1 could be another compound expression or empty, and expr2 could be a number, a variable, an arithmetic expression, or an assignment expression.
expr1 and expr2 are evaluated in order. The evaluation result of the compound expression is the evaluation result of expr2.The following table lists the precedence and associativity of supported operators. Operators are listed top to bottom, in descending precedence.
| Precedence | Operator | Associativity |
|---|---|---|
| 1 | a+b, a-b |
Left-to-right |
| 2 | a=b |
Right-to-left |
Where it is desired to override the precedence and associativity conventions, or even simply to emphasize them, parentheses () can be used to indicate an alternative order of operations (or to simply reinforce the default order of operations). For example, 2 + (3 - 4) forces subtraction to precede addition.
Also note that the ; in a compound expression is NOT an operator.
a = 456 + (1123 - 1); b = a + 1; b; is a compound expression.
a = 456 + (1123 - 1) is the result of 456 + (1123 - 1) (which is 1578). It also associates variable a with this result, i.e., 1578 is the value of variable a.b = a + 1 is evaluated to a + 1 which is 1579 (using Arithmetic expressions) and associates it with variable b.b has value 1579. Therefore, the compound expression as a whole has evaluation result 1579.🦄 Challenge: Add support to multiplicative expressions.
As in Part 2, a tokenizer is given in Tokenizer.h.
Since the input expression now supports variable definition and assignments, we consider more kinds of tokens.
In addition to Number, LeftParen, RightParen, Plus, Minus, Asterisk (for multiplication), End, and Unexpected, the tokenizer also recognizes Variable, Equal, and Semicolon. You can find the detailed definition of the Token class in Token.h.
Given input a = 456 + (1123 - 1); b = a + 1; b;, the tokenizer generates a token list:
Variable("a") Equal("=") Number("456") Plus("+") LeftParen("(") Number("1123") Minus("-") Number("1") RightParen(")") Semicolon(";") Variable("b") Equal("=") Variable("a") Plus("+") Number("1") Semicolon(";") Variable("b") Semicolon(";")
where every term between the parentheses is the literal string for each token.
As in Part 2, you can access the underlying literal string of a token using lexeme. For instance, let t be the token Variable("a"), t.lexeme gives the name of variable "a" as string. For token Number("1123"), you can create a BigInt of 1123 from the number by
BigInt number(t.lexeme);
To check the kind of a token t, you can use the is and is_one_of helper functions defined in Token.h. Below are some examples:
t.is(Token::Kind::Variable);
// returns true if `t` is a `Variable` token and false otherwise
t.is_one_of(Token::Kind::Plus, Token::Kind::Minus, Token::Kind::Number);
// returns true if `t` is a `Plus`, `Minus`, or `Number` token
As in Part 2, you should complete the InfixToPostfixTransformer::infixToPostfix function in Transformer.cpp that transforms an infix token list from Tokenization to its corresponding postfix token stack using the reverse Polish notation.
Recall that the syntax of postfix expression is operand1 operand2 operator where the two operands are specified first, followed by the operator.
When creating the token stack, you should follow the rules in Part 2, that is, pushing operand1 first, then operand2, and finally operator to the stack.
❗️ NOTE: ; is not an operator. When encountering a semicolon, for example something;, push something first and then ; to the stack.
For the example input a = 456 + (1123 - 1); b = a + 1; b;, the output postfix token stack is:
top
+----------------+
| Semicolon(";") |
+----------------+
| Variable("b") |
+----------------+
| Semicolon(";") |
+----------------+
| Equal("=") |
+----------------+
| Plus("+") |
+----------------+
| Number("1") |
+----------------+
| Variable("a") |
+----------------+
| Variable("b") |
+----------------+
| Semicolon(";") |
+----------------+
| Equal("=") |
+----------------+
| Plus("+") |
+----------------+
| Minus("-") |
+----------------+
| Number("1") |
+----------------+
| Number("1123") |
+----------------+
| Number("456") |
+----------------+
| Variable("a") |
+----------------+
where Semicolon(";") is the top of the stack and Variable("a") is the bottom.
🍦 Tips: If you’ve spent weeks and still couldn’t figure out how to do the transformation, you can
Each expression described in Supported expressions can be represented using an expression tree. The definition of ExprTreeNode can be found in ExprTreeNode.h. Each node contains a token, an expr1 pointer (pointing to the node for the first sub-expression), and an expr2 pointer (pointing to the node for the second sub-expression).
You need to implement the buildExprTree function in Evaluator.cpp that builds an expression tree from given postfix token stack. The expression tree for the example input a = 456 + (1123 - 1); b = a + 1; b; can be found in main.cpp, where the node that follows ├─ is expr1 and the node that follows └─ is expr2. To create an ExprTreeNode given a token t, you can do
ExprTreeNode *n = new ExprTreeNode(t); // `expr1` and `expr2` will be `nullptr` by default
Then you can access the token, expr1 and expr2 pointers by n->token, n->expr1 and n->expr2.
The expression tree for a number contains only one ExprTreeNode where expr1 and expr2 pointers are both nullptr. For example, expression 456 has a tree as follows, where the token is displayed inside the node:
+---------------+
| Number("456") |
+---------------+
The expression tree for a variable has only one ExprTreeNode. For example, expression a has a tree as follows:
+---------------+
| Variable("a") |
+---------------+
The expression tree for an arithmetic expression (in the form expr1 operator expr2) contains at least three ExprTreeNodes, with the root being the node for the operator and its two child nodes being the root of subtrees for the two operands expr1 and expr2.
For example, expression 456 - 1123 (456 1123 - in postfix notation) has a tree as follows:
+-------------+
| Plus("+") |
+-------------+
/ \
expr1 expr2
/ \
+---------------+ +----------------+
| Number("456") | | Number("1123") |
+---------------+ +----------------+
As another example, a nested arithmetic expression a + (456 - 1123) (a 456 1123 - + in postfix notation) has a tree as follows:
+-------------+
| Plus("+") |
+-------------+
/ \
expr1 expr2
/ \
+---------------+ +--------------+
| Variable("a") | | Minus("-") |
+---------------+ +--------------+
/ \
expr1 expr2
/ \
+---------------+ +----------------+
| Number("456") | | Number("1123") |
+---------------+ +----------------+
The expression tree for an assignment expression (in the form var1 = expr2) contains at least three ExprTreeNodes, with the root being the node for the operator = and its two child nodes being the root of subtrees for the two operands var1 and expr2.
For example, expression a = 5 (a 5 = in postfix notation) has a tree like the following:
+--------------+
| Equal("=") |
+--------------+
/ \
expr1 expr2
/ \
+---------------+ +-------------+
| Variable("a") | | Number("5") |
+---------------+ +-------------+
As another example, the tree for expression a = b + 456 (a b 456 + = in postfix notation) is as follows:
+--------------+
| Equal("=") |
+--------------+
/ \
expr1 expr2
/ \
+---------------+ +--------------+
| Variable("a") | | Plus("+") |
+---------------+ +--------------+
/ \
expr1 expr2
/ \
+---------------+ +---------------+
| Variable("b") | | Number("456") |
+---------------+ +---------------+
The expression tree for a compound expression (in the form expr1 expr2;) may contain the followingExprTreeNodes: the root being the node for ;, one child node being the root of subtree for expr1 if expr1 is not empty, and the other child node being the root of subtree for expr2. When expr1 is empty, there is only one child node from the root for expr2.
For example, the tree for expression a = 456; a + 1; (a 456 = ; a 1 + ; in postfix notation) is as follows:
+----------------+
| Semicolon(";") |
+----------------+
/ \
expr1 expr2
/ \
+----------------+ +-----------+
| Semicolon(";") | | Plus("+") |
+----------------+ +-----------+
/ | | \
expr1 expr2 expr1 expr2
/ | | \
nullptr +----------------+ +---------------+ +-------------+
| Equal("=") | | Variable("a") | | Number("1") |
+----------------+ +---------------+ +-------------+
/ \
expr1 expr2
/ \
+---------------+ +---------------+
| Variable("a") | | Number("456") |
+---------------+ +---------------+
The nullptr means that the expr1 of a compound expression is empty.
The construction of the expression trees should follow the syntax rules in Supported expressions. For each Variable and Number token, there should be only one ExprTreeNode being created. For each Plus and Minus token, you should create one node for the token, and at least two more nodes for expr1 and expr2. For an Equal token, you should create one node for the token, one node for the var1, and at least one node for expr2. For a Semicolon token, you should create one node for the token, at least one node for expr2, and at least one node for expr1 if it is not empty.
You need to implement the evaluateExprTree function in Evaluator.cpp that evaluates the expression tree rooted at root by traversing through the tree, and returns the resulting BigInt.
The evaluation rules are described in Supported expressions.
evaluateExprTree has a parameter map<string, BigInt> &varTbl that stores the mapping from a literal variable to its BigInt value.
You need to update or access varTbl when you see assignments or uses of variables during the traversal of the tree. To update varTbl with a given key k and value v, do varTbl[k] = v. To get the value of a variable k, do varTbl[k].n by n->token and you can access the literal string of the token by n->token.lexeme. More use cases can be found in Tokenization - Example above.In summary, the whole process of evaluation of a given expression combines the following 4 steps:
| Step | Input (i.e., the parameter) | Output (i.e., the return value) |
|---|---|---|
Tokenization of input infix expression (given in Tokenizer.h) |
String "a = 456 + (1123 - 1); b = a + 1; b;" |
Infix token list in Tokenization - Example |
| Infix token list to postfix token stack transformation | Output from above step | Postfix token stack in Infix to postfix - Example |
| Create expression tree from postfix token stack | Output from above step | Tree in main.cpp |
| Evaluate the expression tree and return the result | Output from above step | 1579 as BigInt |
A sample doctest test case is given in main.cpp.
To run the given test, simply execute in the terminal: make clean test.
We encourage your to write your own tests as the sample one might be “too simple”. But they will not be used for grading.
After testing the program locally, you should submit Evaluator.cpp and Transformer.cpp, as well as BigInt.h, to the autograding system.
There will be 10 test cases (30 points in total) to evaluate your implementation. For each passing test case, you will get 3 points if there is no memory error and 2 points if there are memory errors.
We will also manually look at your code to evaluate the quality of your code (50 points), including coding style, comments, indentation, and so on.
✨ Extra Credit: There are 5 test cases (expressions) that contain the multiplication operator, each of which is 1 point.