v6/site/posts/2021-06-29-lexical-scope.md

102 lines
5.3 KiB
Markdown

```
title = "Part 11: Lexical Scope"
tags = ["build a programming language", "rust"]
date = "2021-06-29 19:14:42 -0400"
short_desc = "Evaluating if statements and dealing with nested scopes."
slug = "lexical-scope"
preamble = '<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>'
```
After adding variables, I added boolean values and comparison operators, because why not. With that in place, I figured it would be a good time to add if statements. Parsing them is straightforward—you just look for the `if` keyword, followed by a bunch of stuff—so I won't go into the details. But actually evaluating them was a bit more complicated.
<!-- excerpt-end -->
The main issue is that of lexical scope, which is where the context (i.e., what variables are accessible) at each point in the program is defined by where in the original source code it is.
Let's say you have some code:
```txt
let a = 1
if (condition) {
print(a)
let b = 2
}
print(b)
```
Entering the body of the if statement starts a new scope in which variables defined in the any encompassing scope can be accessed, but not vice versa. `a`, defined in the outer scope, can be read from the inner scope, but `b`, defined in the inner scope, cannot be accessed from the outer scope.
What this means for me is that, in the evaluator, it's not just enough to have access to the current scope. All the parent scopes are also needed.
There are a couple ways I could approach this. One way would be to have something like a vector of contexts, where the last element is the current context. Accessing a parent context would just mean walking backwards through the vector. And to enter a new scope, you'd construct a new context and push it onto the vector, evaluate whatever you wanted in the new scope, and then remove it from the vector afterwards. This would work, but it risks needing to replace the vector's backing storage every time a context is entered. It's probably a premature optimization, but I decided to take a different approach to avoid the issue.
Another way of doing it is effectively building a singly-linked list, where each `Context` stores an optional reference to its parent. But a simple reference isn't enough. The `Context` struct would need a generic lifetime parameter in order to know how long the reference to its parent lives for. And in order to specify what type the reference refers to, we would need to be able to know how long the parent context's parent lives for. And in order to spell that type out we'd have to know how long the parent's parent's parent—you get the idea.
The solution I came up with was to wrap the context struct in an `Rc`, a reference-counted pointer. So, instead of each context having a direct reference to its parent, it owns an `Rc` that's backed by the same storage. Though, that's not quite enough, because the context needs to be mutable so code can do things like set variables. For that reason, it's actually an `Rc<RefCell<Context>>`. I understand this pattern of interior mutability is common practice in Rust, but coming from languages where this sort of things is handled transparently, it's one of the weirder things I've encountered.
Now, on to how this is actually implemented. It's pretty simple. The `Context` struct stores the `Rc` I described above, but inside an `Option` so that the root context can have `None` as its parent.
```rust
struct Context {
parent: Option<Rc<RefCell<Context>>>,
variables: HashMap<String, Value>,
}
```
Then, instead of the various eval functions taking a reference directly to the context, they get a reference to the `Rc`-wrapped context instead.
```rust
fn eval_binary_op(left: &Node, op: &BinaryOp, right: &Node, context: &Rc<RefCell<Context>>) -> Value {
let left_value = eval_expr(left, context);
// ...
}
```
The constructors for `Context` also change a bit. There's one that doesn't have a parent, called `root`, in addition to `new` which does.
```rust
impl Context {
fn root() -> Self {
Self {
parent: None,
variables: HashMap::new(),
}
}
fn new(parent: Rc<RefCell<Context>>) -> Self {
Self {
parent: Some(parent),
variables: HashMap::new(),
}
}
}
```
Unlike the evaluation functions, `Context::new` takes an owned `Rc`, avoiding the infinite-lifetimes problem from earlier. This requires that, when constructing a new context, we just need to clone the existing `Rc`.
```rust
fn eval_if(condition: &Node, body: &[Statement], context: &Rc<RefCell<Context>>) {
let body_context = Context::new(Rc::clone(context));
let body_context_ref = Rc::new(RefCell::new(body_context));
}
```
After the new context is constructed, it too is wrapped in a `RefCell` and `Rc` for when the condition and body are evaluated. This is a little bit unweidly, but hey, it works.
Actually evaluating the if is simple enough that I won't bother going through it in detail. It just evaluates the condition expression (confirming that it's a boolean; there shall be no implicit conversions!) and, if it's true, evaluates each of the statements in the body.
```rust
fn main() {
let code = r#"let a = 1; if (a == 1) { dbg(a); }"#;
let tokens = tokenize(&code);
let statements = parse(&tokens);
eval(&statements);
}
```
```txt
$ cargo run
Integer(1)
```