shadowfacts.net/site/posts/2021-04-17-fixing-floats.md

88 lines
3.6 KiB
Markdown
Raw Permalink Normal View History

2021-04-17 20:36:19 +00:00
```
metadata.title = "Part 5: Fixing Floats"
metadata.tags = ["build a programming language", "rust"]
metadata.date = "2021-04-17 17:00:42 -0400"
metadata.shortDesc = "A small gotcha in Rust's TakeWhile iterator."
metadata.slug = "fixing-floats"
metadata.preamble = `<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>`
```
In the process of adding floating point numbers, I ran into something a little bit unexpected. The issue turned out to be pretty simple, but I thought it was worth mentioning.
<!-- excerpt-end -->
My initial attempt at this was a simple modification to the original `parse_number` function from the lexer. Instead of stopping when it encounters a non-digit character, I changed it to continuing collecting characters when it encounters a decimal point for the first time.
```rust
fn parse_number<T: Iterator<Item = char>>(it: &mut T) -> Option<Token> {
let mut found_decimal_point = false;
let digits = it.take_while(|c| {
if DIGITS.contains(c) {
true
} else if *c == '.' && !found_decimal_point {
found_decimal_point = true;
true
} else {
false
}
});
// ...
}
```
This seemed to work, and produced tokens like `Float(1.2)` for the input "1.2". But then I tried it with the string "1.2.3", to make sure that lexing would fail when it encountered a dot after the number literal. But it didn't. It failed because it didn't expect the '3' character. The dot seemed to vanish into thin air.
I came up with a simpler test case[^1]:
[^1]: I'm not entirely sure why the `take_some` function is needed here. Trying to call `take_while` directly from `main` causes a compiler error on the next line at the call to `it.peek()` because the iterator is being used after being moved into `take_while`. Does having a separate function somehow fix this? I wouldn't think so, but I'm not a Rust expert. I [posted about it](https://social.shadowfacts.net/notice/A6FitupF6BiJmFEwim) on the fediverse, and if you have an answer, I'd love to hear it.
```rust
fn main() {
let is = vec![1, 2, 3, 4];
let it = is.iter().peekable();
let taken = take_some(it);
println!("taken: {:?}, next: {:?}", taken, it.peek());
}
fn take_some<I: Iterator<Item = i32>>(&mut it: Peekable<I>) -> Vec<i32> {
it.take_while(|i| **i < 3).collect()
}
```
To my surprise, it printed `taken: [1, 2], next: Some(4)`. This time the `3` disappeared.
I inquired about this behavior on the fediverse, and learned that I missed a key line of the docs for the `take_while` method. Before it invokes the closure you passed in, it calls `next()` on the underlying iterator in order to actually have an item for the closure to test. So, it ends up consuming the first element for which the closure returns `false`.
I would have expected it to use the `peek()` method on peekable iterators to avoid this, but I guess not. No matter, a peeking version is easy to implement:
```rust
fn take_while_peek<I, P>(peekable: &mut Peekable<I>, mut predicate: P) -> Vec<I::Item>
where
I: Iterator,
P: FnMut(&I::Item) -> bool,
{
let mut vec: Vec<I::Item> = vec![];
while let Some(it) = peekable.peek() {
if predicate(it) {
vec.push(peekable.next().unwrap());
} else {
break;
}
}
vec
}
```
I can then switch to using the new function in `parse_number_literal` and it no longer consumes extra characters.
```rust
fn parse_number<T: Iterator<Item = char>>(it: &mut T) -> Option<Token> {
let mut found_decimal_point = false;
let digits = take_while_peek(it, |c| {
// ...
});
// ...
}
```