diff --git a/site/posts/2021-04-17-fixing-floats.md b/site/posts/2021-04-17-fixing-floats.md new file mode 100644 index 0000000..c0e2ee6 --- /dev/null +++ b/site/posts/2021-04-17-fixing-floats.md @@ -0,0 +1,87 @@ +``` +metadata.title = "Part 5: Fixing Floats" +metadata.tags = ["build a programming language", "rust"] +metadata.date = "2021-04-17 17:00:42 -0400" +metadata.shortDesc = "A small gotcha in Rust's TakeWhile iterator." +metadata.slug = "fixing-floats" +metadata.preamble = `

This post is part of a series about learning Rust and building a small programming language.


` +``` + +In the process of adding floating point numbers, I ran into something a little bit unexpected. The issue turned out to be pretty simple, but I thought it was worth mentioning. + + + +My initial attempt at this was a simple modification to the original `parse_number` function from the lexer. Instead of stopping when it encounters a non-digit character, I changed it to continuing collecting characters when it encounters a decimal point for the first time. + +```rust +fn parse_number>(it: &mut T) -> Option { + let mut found_decimal_point = false; + let digits = it.take_while(|c| { + if DIGITS.contains(c) { + true + } else if *c == '.' && !found_decimal_point { + found_decimal_point = true; + true + } else { + false + } + }); + // ... +} +``` + +This seemed to work, and produced tokens like `Float(1.2)` for the input "1.2". But then I tried it with the string "1.2.3", to make sure that lexing would fail when it encountered a dot after the number literal. But it didn't. It failed because it didn't expect the '3' character. The dot seemed to vanish into thin air. + +I came up with a simpler test case[^1]: + +[^1]: I'm not entirely sure why the `take_some` function is needed here. Trying to call `take_while` directly from `main` causes a compiler error on the next line at the call to `it.peek()` because the iterator is being used after being moved into `take_while`. Does having a separate function somehow fix this? I wouldn't think so, but I'm not a Rust expert. I [posted about it](https://social.shadowfacts.net/notice/A6FitupF6BiJmFEwim) on the fediverse, and if you have an answer, I'd love to hear it. + +```rust +fn main() { + let is = vec![1, 2, 3, 4]; + let it = is.iter().peekable(); + let taken = take_some(it); + println!("taken: {:?}, next: {:?}", taken, it.peek()); +} + +fn take_some>(&mut it: Peekable) -> Vec { + it.take_while(|i| **i < 3).collect() +} +``` + +To my surprise, it printed `taken: [1, 2], next: Some(4)`. This time the `3` disappeared. + +I inquired about this behavior on the fediverse, and learned that I missed a key line of the docs for the `take_while` method. Before it invokes the closure you passed in, it calls `next()` on the underlying iterator in order to actually have an item for the closure to test. So, it ends up consuming the first element for which the closure returns `false`. + +I would have expected it to use the `peek()` method on peekable iterators to avoid this, but I guess not. No matter, a peeking version is easy to implement: + +```rust +fn take_while_peek(peekable: &mut Peekable, mut predicate: P) -> Vec +where + I: Iterator, + P: FnMut(&I::Item) -> bool, +{ + let mut vec: Vec = vec![]; + while let Some(it) = peekable.peek() { + if predicate(it) { + vec.push(peekable.next().unwrap()); + } else { + break; + } + } + vec +} +``` + +I can then switch to using the new function in `parse_number_literal` and it no longer consumes extra characters. + +```rust +fn parse_number>(it: &mut T) -> Option { + let mut found_decimal_point = false; + let digits = take_while_peek(it, |c| { + // ... + }); + // ... +} +``` +