shadowfacts.net/site/posts/2021-04-17-fixing-floats.md

3.6 KiB

metadata.title = "Part 5: Fixing Floats"
metadata.tags = ["build a programming language", "rust"]
metadata.date = "2021-04-17 17:00:42 -0400"
metadata.shortDesc = "A small gotcha in Rust's TakeWhile iterator."
metadata.slug = "fixing-floats"
metadata.preamble = `<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>`

In the process of adding floating point numbers, I ran into something a little bit unexpected. The issue turned out to be pretty simple, but I thought it was worth mentioning.

My initial attempt at this was a simple modification to the original parse_number function from the lexer. Instead of stopping when it encounters a non-digit character, I changed it to continuing collecting characters when it encounters a decimal point for the first time.

fn parse_number<T: Iterator<Item = char>>(it: &mut T) -> Option<Token> {
	let mut found_decimal_point = false;
	let digits = it.take_while(|c| {
		if DIGITS.contains(c) {
			true
		} else if *c == '.' && !found_decimal_point {
			found_decimal_point = true;
			true
		} else {
			false
		}
	});
	// ...
}

This seemed to work, and produced tokens like Float(1.2) for the input "1.2". But then I tried it with the string "1.2.3", to make sure that lexing would fail when it encountered a dot after the number literal. But it didn't. It failed because it didn't expect the '3' character. The dot seemed to vanish into thin air.

I came up with a simpler test case1:

fn main() {
	let is = vec![1, 2, 3, 4];
	let it = is.iter().peekable();
	let taken = take_some(it);
	println!("taken: {:?}, next: {:?}", taken, it.peek());
}

fn take_some<I: Iterator<Item = i32>>(&mut it: Peekable<I>) -> Vec<i32> {
	it.take_while(|i| **i < 3).collect()
}

To my surprise, it printed taken: [1, 2], next: Some(4). This time the 3 disappeared.

I inquired about this behavior on the fediverse, and learned that I missed a key line of the docs for the take_while method. Before it invokes the closure you passed in, it calls next() on the underlying iterator in order to actually have an item for the closure to test. So, it ends up consuming the first element for which the closure returns false.

I would have expected it to use the peek() method on peekable iterators to avoid this, but I guess not. No matter, a peeking version is easy to implement:

fn take_while_peek<I, P>(peekable: &mut Peekable<I>, mut predicate: P) -> Vec<I::Item> 
where
	I: Iterator,
	P: FnMut(&I::Item) -> bool,
{
	let mut vec: Vec<I::Item> = vec![];
	while let Some(it) = peekable.peek() {
		if predicate(it) {
			vec.push(peekable.next().unwrap());
		} else {
			break;
		}
	}
	vec
}

I can then switch to using the new function in parse_number_literal and it no longer consumes extra characters.

fn parse_number<T: Iterator<Item = char>>(it: &mut T) -> Option<Token> {
	let mut found_decimal_point = false;
	let digits = take_while_peek(it, |c| {
		// ...
	});
	// ...
}

  1. I'm not entirely sure why the take_some function is needed here. Trying to call take_while directly from main causes a compiler error on the next line at the call to it.peek() because the iterator is being used after being moved into take_while. Does having a separate function somehow fix this? I wouldn't think so, but I'm not a Rust expert. I posted about it on the fediverse, and if you have an answer, I'd love to hear it. ↩︎