Add Part 11: Lexical Scope

Correct declare_variable example
2021-06-29 19:15:26 -04:00 · 2021-05-16 14:59:08 -04:00
2 changed files with 105 additions and 1 deletions
--- a/site/posts/2021-05-09-variable-declarations.md
+++ b/site/posts/2021-05-09-variable-declarations.md
@ -72,8 +72,12 @@ impl Context {
 	}

 	fn declare_variable(&mut self, name: &str, value: Value) {
+		if self.variables.contains_key(name) {
+			panic!("cannot re-declare variable {}", name);
+		} else {
 			self.variables.insert(name.into(), value);
 		}
+	}
 }
 ```

--- a/site/posts/2021-06-29-lexical-scope.md
+++ b/site/posts/2021-06-29-lexical-scope.md
@ -0,0 +1,100 @@
+```
+metadata.title = "Part 11: Lexical Scope"
+metadata.tags = ["build a programming language", "rust"]
+metadata.date = "2021-06-29 19:14:42 -0400"
+metadata.shortDesc = "Evaluating if statements and dealing with nested scopes."
+metadata.slug = "lexical-scope"
+metadata.preamble = `<p style="font-style: italic;">This post is part of a <a href="/build-a-programming-language/" data-link="/build-a-programming-language/">series</a> about learning Rust and building a small programming language.</p><hr>`
+```
+
+After adding variables, I added boolean values and comparison operators, because why not. With that in place, I figured it would be a good time to add if statements. Parsing them is straightforward—you just look for the `if` keyword, followed by a bunch of stuff—so I won't go into the details. But actually evaluating them was a bit more complicated.
+
+<!-- excerpt-end -->
+
+The main issue is that of lexical scope, which is where the context (i.e., what variables are accessible) at each point in the program is defined by where in the original source code it is.
+
+Let's say you have some code:
+
+```txt
+let a = 1
+if (condition) {
+	print(a)
+	let b = 2
+}
+print(b)
+```
+
+Entering the body of the if statement starts a new scope in which variables defined in the any encompassing scope can be accessed, but not vice versa. `a`, defined in the outer scope, can be read from the inner scope, but `b`, defined in the inner scope, cannot be accessed from the outer scope.
+
+What this means for me is that, in the evaluator, it's not just enough to have access to the current scope. All the parent scopes are also needed.
+
+There are a couple ways I could approach this. One way would be to have something like a vector of contexts, where the last element is the current context. Accessing a parent context would just mean walking backwards through the vector. And to enter a new scope, you'd construct a new context and push it onto the vector, evaluate whatever you wanted in the new scope, and then remove it from the vector afterwards. This would work, but it risks needing to replace the vector's backing storage every time a context is entered. It's probably a premature optimization, but I decided to take a different approach to avoid the issue.
+
+Another way of doing it is effectively building a singly-linked list, where each `Context` stores an optional reference to its parent. But a simple reference isn't enough. The `Context` struct would need a generic lifetime parameter in order to know how long the reference to its parent lives for. And in order to specify what type the reference refers to, we would need to be able to know how long the parent context's parent lives for. And in order to spell that type out we'd have to know how long the parent's parent's parent—you get the idea.
+
+The solution I came up with was to wrap the context struct in an `Rc`, a reference-counted pointer. So, instead of each context having a direct reference to its parent, it owns an `Rc` that's backed by the same storage. Though, that's not quite enough, because the context needs to be mutable so code can do things like set variables. For that reason, it's actually an `Rc<RefCell<Context>>`. I understand this pattern of interior mutability is common practice in Rust, but coming from languages where this sort of things is handled transparently, it's one of the weirder things I've encountered.
+
+Now, on to how this is actually implemented. It's pretty simple. The `Context` struct stores the `Rc` I described above, but inside an `Option` so that the root context can have `None` as its parent.
+
+```rust
+struct Context {
+	parent: Option<Rc<RefCell<Context>>>,
+	variables: HashMap<String, Value>,
+}
+```
+
+Then, instead of the various eval functions taking a reference directly to the context, they get a reference to the `Rc`-wrapped context instead.
+
+```rust
+fn eval_binary_op(left: &Node, op: &BinaryOp, right: &Node, context: &Rc<RefCell<Context>>) -> Value {
+	let left_value = eval_expr(left, context);
+	// ...
+}
+```
+
+The constructors for `Context` also change a bit. There's one that doesn't have a parent, called `root`, in addition to `new` which does.
+
+```rust
+impl Context {
+	fn root() -> Self {
+		Self {
+			parent: None,
+			variables: HashMap::new(),
+		}
+	}
+
+	fn new(parent: Rc<RefCell<Context>>) -> Self {
+		Self {
+			parent: Some(parent),
+			variables: HashMap::new(),
+		}
+	}
+}
+```
+
+Unlike the evaluation functions, `Context::new` takes an owned `Rc`, avoiding the infinite-lifetimes problem from earlier. This requires that, when constructing a new context, we just need to clone the existing `Rc`.
+
+```rust
+fn eval_if(condition: &Node, body: &[Statement], context: &Rc<RefCell<Context>>) {
+	let body_context = Context::new(Rc::clone(context));
+	let body_context_ref = Rc::new(RefCell::new(body_context));
+}
+```
+
+After the new context is constructed, it too is wrapped in a `RefCell` and `Rc` for when the condition and body are evaluated. This is a little bit unweidly, but hey, it works.
+
+Actually evaluating the if is simple enough that I won't bother going through it in detail. It just evaluates the condition expression (confirming that it's a boolean; there shall be no implicit conversions!) and, if it's true, evaluates each of the statements in the body.
+
+```rust
+fn main() {
+	let code = r#"let a = 1; if (a == 1) { dbg(a); }"#;
+	let tokens = tokenize(&code);
+	let statements = parse(&tokens);
+	eval(&statements);
+}
+```
+
+```txt
+$ cargo run
+Integer(1)
+```
Author	SHA1	Message	Date
Shadowfacts	574577c7c4	Add Part 11: Lexical Scope	2021-06-29 19:15:26 -04:00
Shadowfacts	b3502f257f	Correct declare_variable example	2021-05-16 14:59:08 -04:00