88 lines
3.6 KiB
Markdown
88 lines
3.6 KiB
Markdown
|
```
|
||
|
metadata.title = "Part 5: Fixing Floats"
|
||
|
metadata.tags = ["build a programming language", "rust"]
|
||
|
metadata.date = "2021-04-17 17:00:42 -0400"
|
||
|
metadata.shortDesc = "A small gotcha in Rust's TakeWhile iterator."
|
||
|
metadata.slug = "fixing-floats"
|
||
|
metadata.preamble = `<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>`
|
||
|
```
|
||
|
|
||
|
In the process of adding floating point numbers, I ran into something a little bit unexpected. The issue turned out to be pretty simple, but I thought it was worth mentioning.
|
||
|
|
||
|
<!-- excerpt-end -->
|
||
|
|
||
|
My initial attempt at this was a simple modification to the original `parse_number` function from the lexer. Instead of stopping when it encounters a non-digit character, I changed it to continuing collecting characters when it encounters a decimal point for the first time.
|
||
|
|
||
|
```rust
|
||
|
fn parse_number<T: Iterator<Item = char>>(it: &mut T) -> Option<Token> {
|
||
|
let mut found_decimal_point = false;
|
||
|
let digits = it.take_while(|c| {
|
||
|
if DIGITS.contains(c) {
|
||
|
true
|
||
|
} else if *c == '.' && !found_decimal_point {
|
||
|
found_decimal_point = true;
|
||
|
true
|
||
|
} else {
|
||
|
false
|
||
|
}
|
||
|
});
|
||
|
// ...
|
||
|
}
|
||
|
```
|
||
|
|
||
|
This seemed to work, and produced tokens like `Float(1.2)` for the input "1.2". But then I tried it with the string "1.2.3", to make sure that lexing would fail when it encountered a dot after the number literal. But it didn't. It failed because it didn't expect the '3' character. The dot seemed to vanish into thin air.
|
||
|
|
||
|
I came up with a simpler test case[^1]:
|
||
|
|
||
|
[^1]: I'm not entirely sure why the `take_some` function is needed here. Trying to call `take_while` directly from `main` causes a compiler error on the next line at the call to `it.peek()` because the iterator is being used after being moved into `take_while`. Does having a separate function somehow fix this? I wouldn't think so, but I'm not a Rust expert. I [posted about it](https://social.shadowfacts.net/notice/A6FitupF6BiJmFEwim) on the fediverse, and if you have an answer, I'd love to hear it.
|
||
|
|
||
|
```rust
|
||
|
fn main() {
|
||
|
let is = vec![1, 2, 3, 4];
|
||
|
let it = is.iter().peekable();
|
||
|
let taken = take_some(it);
|
||
|
println!("taken: {:?}, next: {:?}", taken, it.peek());
|
||
|
}
|
||
|
|
||
|
fn take_some<I: Iterator<Item = i32>>(&mut it: Peekable<I>) -> Vec<i32> {
|
||
|
it.take_while(|i| **i < 3).collect()
|
||
|
}
|
||
|
```
|
||
|
|
||
|
To my surprise, it printed `taken: [1, 2], next: Some(4)`. This time the `3` disappeared.
|
||
|
|
||
|
I inquired about this behavior on the fediverse, and learned that I missed a key line of the docs for the `take_while` method. Before it invokes the closure you passed in, it calls `next()` on the underlying iterator in order to actually have an item for the closure to test. So, it ends up consuming the first element for which the closure returns `false`.
|
||
|
|
||
|
I would have expected it to use the `peek()` method on peekable iterators to avoid this, but I guess not. No matter, a peeking version is easy to implement:
|
||
|
|
||
|
```rust
|
||
|
fn take_while_peek<I, P>(peekable: &mut Peekable<I>, mut predicate: P) -> Vec<I::Item>
|
||
|
where
|
||
|
I: Iterator,
|
||
|
P: FnMut(&I::Item) -> bool,
|
||
|
{
|
||
|
let mut vec: Vec<I::Item> = vec![];
|
||
|
while let Some(it) = peekable.peek() {
|
||
|
if predicate(it) {
|
||
|
vec.push(peekable.next().unwrap());
|
||
|
} else {
|
||
|
break;
|
||
|
}
|
||
|
}
|
||
|
vec
|
||
|
}
|
||
|
```
|
||
|
|
||
|
I can then switch to using the new function in `parse_number_literal` and it no longer consumes extra characters.
|
||
|
|
||
|
```rust
|
||
|
fn parse_number<T: Iterator<Item = char>>(it: &mut T) -> Option<Token> {
|
||
|
let mut found_decimal_point = false;
|
||
|
let digits = take_while_peek(it, |c| {
|
||
|
// ...
|
||
|
});
|
||
|
// ...
|
||
|
}
|
||
|
```
|
||
|
|