v6/site/posts/2021-04-18-grouping.md
2022-12-10 13:15:32 -05:00

3.7 KiB

title = "Part 6: Grouping"
tags = ["build a programming language", "rust"]
date = "2021-04-18 14:42:42 -0400"
slug = "grouping"
preamble = '<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>'

Parsing groups is pretty straightforward, with only one minor pain point to keep in mind. I'll gloss over adding left and right parentheses because it's super easy—just another single character token.

To actually parse the group from the token stream, in the parse_expression function I look for a left paren at the beginning of an expression, and call parse_group if one is found.

fn parse_expression<'a, I: Iterator<Item = &'a Token>>(it: &mut Peekable<I>) -> Option<Node> {
	// ...
	let mut node: Node = match it.peek().unwrap() {
		// ...
		Token::LeftParen => parse_group(it).unwrap(),
	}
	// ...
}

The parse_group function is also pretty simple. It consumes the left paren and then calls parse_expression to parse what's inside the parentheses. Afterwards, assuming it's found something, it consumes the right paren and returns a new Group node (which has just one field, another boxed Node that's its content.

fn parse_group<'a, I: Iterator<Item = &'a Token>>(it: &mut Peekable<I>) -> Option<Node> {
	match it.peek() {
		Some(Token::LeftParen) => (),
		_ => return None,
	}

	it.next();

	if let Some(inner) = parse_expression(it) {
		match it.peek() {
			Some(Token::RightParen) => (),
			tok => panic!("expected closing parenthesis after group, got {:?}", tok),
		}
		it.next();
		Some(Node::Group {
			node: Box::new(inner),
		})
	} else {
		panic!("expected expression inside group");
	}
}

This looks pretty good, but trying to run it and parse an expression like (1) will crash the program. Specifically, it'll panic with a message saying unexpected token: RightParen.

At first, this was pretty confusing. Shouldn't the right paren be consumed the parse_group function? Running with RUST_BACKTRACE=1 reveals what the problem actually is.

It's panicking inside the recursive call to parse_expression coming from parse_group, before that function even has a chance to cosume the right paren. Specifically, parse_expression is seeing a token after the first part of the expression and is trying to combine it with the existing node and failing because a right paren is not a binary operator.

What should happen is that parse_expression should see the paren following the expression, realize that the expression is over, and not do anything with it. That way, when the recursive parse_expression returns, parse_group will be able to consume the right paren as it expects.

To do this, there's a constant list of tokens which are considered to end the current expression. Then, in parse_expression, in addition to checking if the next token after an expression is a binary operator, we can check if the token is an expression end. And if so, avoid panicking.

const EXPRESSION_END_TOKENS: &[Token] = &[Token::RightParen];

fn parse_expression<'a, T: Iterator<Item = &'a Token>>(it: &mut Peekable<T>) -> Option<Node> {
	// ...
	if let Some(next) = it.peek() {
		if is_binary_operator_token(next) {
			// ...
		} else if EXPRESSION_END_TOKENS.contains(next) {
			// no-op
		} else {
			panic!("unexpected token: {:?}", next);
		}
	}

	Some(node)
}

And now it can parse grouped expressions:

fn main() {
	let tokens = tokenize("(1)");
	if let node = parse(&tokens) {
		println!("node: {:#?}", &node);
	}
}
$ cargo run
node: Group {
	node: Integer(1),
}

(I won't bother discussing evaluating groups because it's trivial.)