v6/site/posts/2021-04-18-grouping.md

101 lines
3.7 KiB
Markdown

```
title = "Part 6: Grouping"
tags = ["build a programming language", "rust"]
date = "2021-04-18 14:42:42 -0400"
slug = "grouping"
preamble = '<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>'
```
Parsing groups is pretty straightforward, with only one minor pain point to keep in mind. I'll gloss over adding left and right parentheses because it's super easy—just another single character token.
<!-- excerpt-end -->
To actually parse the group from the token stream, in the `parse_expression` function I look for a left paren at the beginning of an expression, and call `parse_group` if one is found.
```rust
fn parse_expression<'a, I: Iterator<Item = &'a Token>>(it: &mut Peekable<I>) -> Option<Node> {
// ...
let mut node: Node = match it.peek().unwrap() {
// ...
Token::LeftParen => parse_group(it).unwrap(),
}
// ...
}
```
The `parse_group` function is also pretty simple. It consumes the left paren and then calls `parse_expression` to parse what's inside the parentheses. Afterwards, assuming it's found something, it consumes the right paren and returns a new `Group` node (which has just one field, another boxed `Node` that's its content.
```rust
fn parse_group<'a, I: Iterator<Item = &'a Token>>(it: &mut Peekable<I>) -> Option<Node> {
match it.peek() {
Some(Token::LeftParen) => (),
_ => return None,
}
it.next();
if let Some(inner) = parse_expression(it) {
match it.peek() {
Some(Token::RightParen) => (),
tok => panic!("expected closing parenthesis after group, got {:?}", tok),
}
it.next();
Some(Node::Group {
node: Box::new(inner),
})
} else {
panic!("expected expression inside group");
}
}
```
This looks pretty good, but trying to run it and parse an expression like `(1)` will crash the program. Specifically, it'll panic with a message saying `unexpected token: RightParen`.
At first, this was pretty confusing. Shouldn't the right paren be consumed the `parse_group` function? Running with `RUST_BACKTRACE=1` reveals what the problem actually is.
It's panicking inside the recursive call to `parse_expression` coming from `parse_group`, before that function even has a chance to cosume the right paren. Specifically, `parse_expression` is seeing a token after the first part of the expression and is trying to combine it with the existing node and failing because a right paren is not a binary operator.
What should happen is that `parse_expression` should see the paren following the expression, realize that the expression is over, and not do anything with it. That way, when the recursive `parse_expression` returns, `parse_group` will be able to consume the right paren as it expects.
To do this, there's a constant list of tokens which are considered to end the current expression. Then, in `parse_expression`, in addition to checking if the next token after an expression is a binary operator, we can check if the token is an expression end. And if so, avoid panicking.
```rust
const EXPRESSION_END_TOKENS: &[Token] = &[Token::RightParen];
fn parse_expression<'a, T: Iterator<Item = &'a Token>>(it: &mut Peekable<T>) -> Option<Node> {
// ...
if let Some(next) = it.peek() {
if is_binary_operator_token(next) {
// ...
} else if EXPRESSION_END_TOKENS.contains(next) {
// no-op
} else {
panic!("unexpected token: {:?}", next);
}
}
Some(node)
}
```
And now it can parse grouped expressions:
```rust
fn main() {
let tokens = tokenize("(1)");
if let node = parse(&tokens) {
println!("node: {:#?}", &node);
}
}
```
```sh
$ cargo run
node: Group {
node: Integer(1),
}
```
(I won't bother discussing evaluating groups because it's trivial.)