forked from shadowfacts/shadowfacts.net
131 lines
4.1 KiB
Markdown
131 lines
4.1 KiB
Markdown
|
```
|
||
|
metadata.title = "Part 8: Variable Lookups and Function Calls"
|
||
|
metadata.tags = ["build a programming language", "rust"]
|
||
|
metadata.date = "2021-04-25 11:15:42 -0400"
|
||
|
metadata.shortDesc = ""
|
||
|
metadata.slug = "variable-lookups-and-function-calls"
|
||
|
metadata.preamble = `<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>`
|
||
|
```
|
||
|
|
||
|
Arithmetic expressions are all well and good, but they don't really feel much like a programming language. To fix that, let's start working on variables and function calls.
|
||
|
|
||
|
<!-- excerpt-end -->
|
||
|
|
||
|
First step: lexing.
|
||
|
|
||
|
There are two new token types: identifier and comma.
|
||
|
|
||
|
```rust
|
||
|
enum Token {
|
||
|
// ...
|
||
|
Identifier(String),
|
||
|
Comma,
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The comma is just a single comma character. The identifier is a sequence of characters that represent the name of a variable or function. An identifier starts with a letter (either lower or uppercase) and is followed by any number of letters, digits, and underscores.
|
||
|
|
||
|
The main `tokenize` function checks if it's looking at a letter, and, if so, calls the `parse_identifier` function. `parse_identifier` simply accumulates as many valid identifier characters as there are and wraps them up in a token.
|
||
|
|
||
|
```rust
|
||
|
fn parse_identifier<I: Iterator<Item = char>>(it: &mut Peekable<I>) -> Option<Token> {
|
||
|
let chars = take_while_peek(it, |c| {
|
||
|
LOWERCASE_LETTERS.contains(c)
|
||
|
|| UPPERCASE_LETTERS.contains(c)
|
||
|
|| DIGITS.contains(c)
|
||
|
|| *c == '_'
|
||
|
});
|
||
|
if chars.is_empty() {
|
||
|
None
|
||
|
} else {
|
||
|
let s = String::from_iter(chars);
|
||
|
Some(Token::Identifier(s))
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The next step is parsing.
|
||
|
|
||
|
There are two new kinds of AST nodes: lookup nodes and function call nodes. The only data lookup nodes store is the name of the variable they refer to. Function call nodes store the nodes for their parameters, in addition to the function name.
|
||
|
|
||
|
When parsing an expression, an identifier token results in either a lookup or function call node, depending on whether it's followed by a left-paren.
|
||
|
|
||
|
```rust
|
||
|
fn do_parse<'a, I: Iterator<Item = &'a Token>>(it: &mut Peekable<I>) -> Option<Node> {
|
||
|
// ...
|
||
|
let mut node: Node = match it.peek().unwrap() {
|
||
|
// ...
|
||
|
Token::Identifier(value) => {
|
||
|
it.next();
|
||
|
match it.peek() {
|
||
|
Some(Token::LeftParen) => Node::Call {
|
||
|
name: value.clone(),
|
||
|
params: parse_function_params(it),
|
||
|
}
|
||
|
_ => Node::Lookup {
|
||
|
name: value.clone(),
|
||
|
},
|
||
|
}
|
||
|
}
|
||
|
};
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Actually parsing function parameters is left to another function. After consuming the opening parenthesis, it checks if the next token is the closing right-paren. If it is, the right-paren is consumed and an empty vector is returned for the paramters.
|
||
|
|
||
|
If it isn't, the function enters a loop in which it parses a parameter expression and then expects to find either a comma or right-paren. If there's a comma, it's consumed and it moves on to the next iteration of the loop. If it's a closing parenthesis, it too is consumed and then the loop is exited and the parameter list returned. Upon encountering any other token, it panics.
|
||
|
|
||
|
```rust
|
||
|
fn parse_function_params<'a, I: Iterator<Item = &'a Token>>(it: &mut Peekable<I>) -> Vec<Node> {
|
||
|
it.next(); // consume left paren
|
||
|
if let Some(Token::RightParen) = it.peek() {
|
||
|
it.next();
|
||
|
vec![]
|
||
|
} else {
|
||
|
let mut params = vec![];
|
||
|
loop {
|
||
|
let param_node = do_parse(it).expect("function parameter");
|
||
|
params.push(param_node);
|
||
|
match it.peek() {
|
||
|
Some(Token::Comma) => {
|
||
|
it.next();
|
||
|
}
|
||
|
Some(Token::RightParen) => {
|
||
|
it.next();
|
||
|
break;
|
||
|
}
|
||
|
tok => {
|
||
|
panic!("unexpected token {:?} after function parameter", tok);
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
params
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
And lastly, to make this work correctly, the comma token is added to the list of expression-end tokens.
|
||
|
|
||
|
With that, parsing function calls and variable lookups is possible:
|
||
|
|
||
|
```rust
|
||
|
fn main() {
|
||
|
let tokens = tokenize("foo(bar)");
|
||
|
if let node = parse(&tokens) {
|
||
|
println!("{:?}", node);
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
```sh
|
||
|
$ cargo run
|
||
|
Call {
|
||
|
name: "foo",
|
||
|
params: [
|
||
|
Lookup {
|
||
|
name: "bar",
|
||
|
},
|
||
|
],
|
||
|
}
|
||
|
```
|