``` metadata.title = "Part 5: Fixing Floats" metadata.tags = ["build a programming language", "rust"] metadata.date = "2021-04-17 17:00:42 -0400" metadata.shortDesc = "A small gotcha in Rust's TakeWhile iterator." metadata.slug = "fixing-floats" metadata.preamble = `

This post is part of a series about learning Rust and building a small programming language.


` ``` In the process of adding floating point numbers, I ran into something a little bit unexpected. The issue turned out to be pretty simple, but I thought it was worth mentioning. My initial attempt at this was a simple modification to the original `parse_number` function from the lexer. Instead of stopping when it encounters a non-digit character, I changed it to continuing collecting characters when it encounters a decimal point for the first time. ```rust fn parse_number>(it: &mut T) -> Option { let mut found_decimal_point = false; let digits = it.take_while(|c| { if DIGITS.contains(c) { true } else if *c == '.' && !found_decimal_point { found_decimal_point = true; true } else { false } }); // ... } ``` This seemed to work, and produced tokens like `Float(1.2)` for the input "1.2". But then I tried it with the string "1.2.3", to make sure that lexing would fail when it encountered a dot after the number literal. But it didn't. It failed because it didn't expect the '3' character. The dot seemed to vanish into thin air. I came up with a simpler test case[^1]: [^1]: I'm not entirely sure why the `take_some` function is needed here. Trying to call `take_while` directly from `main` causes a compiler error on the next line at the call to `it.peek()` because the iterator is being used after being moved into `take_while`. Does having a separate function somehow fix this? I wouldn't think so, but I'm not a Rust expert. I [posted about it](https://social.shadowfacts.net/notice/A6FitupF6BiJmFEwim) on the fediverse, and if you have an answer, I'd love to hear it. ```rust fn main() { let is = vec![1, 2, 3, 4]; let it = is.iter().peekable(); let taken = take_some(it); println!("taken: {:?}, next: {:?}", taken, it.peek()); } fn take_some>(&mut it: Peekable) -> Vec { it.take_while(|i| **i < 3).collect() } ``` To my surprise, it printed `taken: [1, 2], next: Some(4)`. This time the `3` disappeared. I inquired about this behavior on the fediverse, and learned that I missed a key line of the docs for the `take_while` method. Before it invokes the closure you passed in, it calls `next()` on the underlying iterator in order to actually have an item for the closure to test. So, it ends up consuming the first element for which the closure returns `false`. I would have expected it to use the `peek()` method on peekable iterators to avoid this, but I guess not. No matter, a peeking version is easy to implement: ```rust fn take_while_peek(peekable: &mut Peekable, mut predicate: P) -> Vec where I: Iterator, P: FnMut(&I::Item) -> bool, { let mut vec: Vec = vec![]; while let Some(it) = peekable.peek() { if predicate(it) { vec.push(peekable.next().unwrap()); } else { break; } } vec } ``` I can then switch to using the new function in `parse_number_literal` and it no longer consumes extra characters. ```rust fn parse_number>(it: &mut T) -> Option { let mut found_decimal_point = false; let digits = take_while_peek(it, |c| { // ... }); // ... } ```