shadowfacts.net/site/posts/2021-04-16-operator-precede...

193 lines
6.5 KiB
Markdown
Raw Permalink Normal View History

2021-04-16 22:38:24 +00:00
```
metadata.title = "Part 4: Operator Precedence"
metadata.tags = ["build a programming language", "rust"]
metadata.date = "2021-04-16 17:00:42 -0400"
metadata.shortDesc = ""
metadata.slug = "operator-precedence"
metadata.preamble = `<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>`
```
I've gone through the lexer, parser, and evaluator and added subtraction, multiplication, and division in addition to, uh... addition. And they kind of work, but there's one glaring issue that I mentioned back in part 2. It's that the parser has no understanding of operator precedence. That is to say, it doesn't know which operators have a higher priority in the order of operations when implicit grouping is taking place.
<!-- excerpt-end -->
Currently, an expression like `2 * 3 + 4` will be parsed as if the `3 + 4` was grouped together, meaning the evaluation would ultimately result in 14 instead of the expected 10. This is what the AST for that expression currently looks like:
```plaintext
*
/ \
2 +
/ \
3 4
```
But, the multiplication operator should actually have a higher pririty and therefore be deeper in the node tree so that it's evaluated first.
Another closely related issue is [associativity](https://en.wikipedia.org/wiki/Operator_associativity). Whereas operator precedence governs implicit grouping behavior when there are operators of _different_ precedences (like addition and multiplication), operator associativity defines how implicit grouping works for multiple operators _of the same precedence_ (or multiple of the same operator).
Looking at the AST for an expression like "1 - 2 - 3", you can see the same issue is present as above:
```plaintext
-
/ \
1 -
/ \
2 3
```
In both of these cases, what the parser needs to do is the same. It needs to implicitly group the middle node with the left node, rather than the right one. This will result in node trees that look like this:
```plaintext
+ -
/ \ / \
* 4 - 3
/ \ / \
2 3 1 2
```
To accomplish this, I added precedence and associativity enums as well as methods on the `BinaryOp` enum to get each operation's specific values so that when the parser is parsing, it can make a decision about how to group things based on this information.
The `Precedence` enum has a derived implementation of the `PartialOrd` trait, meaning the cases are ordered from least to greatest in the same order they're written in the code, so that the precedence values can be compared directly with operators like `<`. Addition/subtraction and multiplication/division currently share precedences. Also, every operator currently has left associativity.
```rust
enum BinaryOp {
Add,
Subtract,
Multiply,
Divide,
}
#[derive(PartialEq, PartialOrd)]
enum Precedence {
AddSub,
MulDiv
}
#[derive(PartialEq)]
enum Associativity {
Left,
Right,
}
impl BinaryOp {
fn precedence(&self) -> Precedence {
match self {
BinaryOp::Add | BinaryOp::Subtract => Precedence::AddSub,
BinaryOp::Multiply | BinaryOp::Divide => Precedence::MulDiv,
}
}
fn associativity(&self) -> Associativity {
Associativity::Left
}
}
```
In the `do_parse` function, things have been changed up. First, there's a separate function for checking if the token that follows the first token in the expression should combine with the first node (i.e., is a binary operator):
```rust
fn is_binary_operator_token(token: &Token) -> bool {
if let Token::Plus | Token::Minus | Token::Asterisk | Token::Slash = token {
true
} else {
false
}
}
```
So instead of matching on individual tokens, `do_parse` just calls that function. If the next token is a binary operator, it consumes the operator token, calls `do_parse` recursively to get the right-hand node and then calls another function to combine the two nodes.
```rust
fn do_parse<'a, T: Iterator<Item = &'a Token>>(it: &mut Peekable<T>) -> Option<Node> {
// ...
if let Some(next) = it.peek() {
if is_binary_operator_token(next) {
let operator_token = it.next().unwrap();
let right = do_parse(it).expect("expression after binary operator");
node = combine_with_binary_operator(node, operator_token, right);
} else {
panic!("unexpected token: {:?}", next);
}
}
Some(node)
}
```
But, before I get to the `combine_with_binary_operator` function, there's another function that decides whether a binary operator should be grouped to the left with another node by following the rule I described earlier.
```rust
fn should_group_left(left_op: &BinaryOp, right: &Node) -> bool {
match right {
Node::BinaryOp { op: right_op, .. } => {
right_op.precedence() < left_op.precedence()
|| (right_op.precedence() == left_op.precedence()
&& left_op.associativity() == Associativity::Left)
}
_ => false,
}
}
```
The `combine_with_binary_operator` function can then use this (after converting the binary operator token into a `BinaryOp` value) to decide what it should do.
```rust
fn combine_with_binary_operator(left: Node, token: &Token, right: Node) -> Node {
let op: BinaryOp = match token {
// ...
};
if should_group_left(&op, &right) {
if let Node::BinaryOp {
left: right_left,
op: right_op,
right: right_right,
} {
Node::BinaryOp {
left: Box::new(Node::BinaryOp {
left: Box::new(left),
op,
right: right_left,
}),
op: right_op,
right: right_right,
}
} else {
panic!();
}
} else {
Node::BinaryOp {
left: Box::new(left),
op,
right: Box::new(right),
}
}
}
```
If there are two binary operators and it does need to be grouped to the left, it performs the same transformation I described above, constructing a new outer binary operator node and the new left-hand inner node. Diagramatically, the transformation looks like this (where uppercase letters are the binary operator nodes and lowercase letters are values):
```plaintext
Original expression: x A y B z
A B
/ \ / \
x B -> A z
/ \ / \
y z x y
```
If it does not need to be grouped left, the function simply creates a new binary operator node, leaving the left and right sides as-is.
And, after adding the new operators to the `eval_binary_op` function, it can now **correctly** compute simple arithmetic expressions!
```rust
fn main() {
let tokens = tokenize("2 * 3 + 4");
if let node = parse(&tokens) {
println!("result: ", eval(&node));
}
}
```
```sh
$ cargo run
result: Integer(10)
```