Writing a simple LISP interpreter in C#
A few months ago, I decided to develop an interpreter for a subset of the LISP programming language. This was a great exercise, and I’d like to share a few things I learned along the way.
Full code and where we’re going
By the end of this blog you’ll know how to implement a simple interpreter. And your code will be able to interpret code like this.
> (define fib (lambda (n) ;; Fibonacci!
(if (< n 2)
1
(+ (fib (- n 1)) (fib (- n 2))))))
> (fib 10)
55
For reference, the final project can be found here.
Background
An interpreter is a program that reads raw text in one language and translates that text into something executable in another language. It does this by constructing an abstract syntax tree (or “AST” for short), an object representation that captures the execution semantics of the raw text. The tree is then traversed by an evaluation algorithm and the result is returned.
This is different from, although similar to, the process used by a compiler to execute code. A compiler, however, will typically try to optimize your code.1 Because of this optimization step, a representation of the compiled code needs to be saved to the disk for later use. In contrast, interpreters execute the input text in memory.
It should be noted that compilers and interpreters are not a feature of the language but rather a feature of the language implementation. One could, for example, implement the C programming language with an interpreter, and the Python programming language with a compiler. The choice of whether or not to implement a language with an interpreter or a compiler is strictly up to the developer.
Parsing
Unlike infix notation (operators appear in-between operands), LISP is a relatively easy language to parse. Since everything is fully parenthesized, we don’t need to pay special attention to operator precedence and the order of operations while parsing. Rather, we’re more or less given the AST and it’s up to us to translate the tree into a set of syntax classes by looking at the first element in each parenthesized expression.2
When parsing the text, we have two types to consider; atomics, (indivisible values like numbers and bools), and lists which contain multiple expressions. If we see an atomic, there’s nothing more to do — we simply return an object that holds the atomic data (the raw string).
If we see an opening parenthesis, we know that we’re populating a list. We should continue parsing from the next token in the input stream, and stop once we see a closing parenthesis. At that point, we can return the list.
In my implementation, I defined two types, SExpAtom
and SExpList
, both of which derive from SExp
, an abstract class that is used to group the two child classes (more formally, the LISP syntax uses s-expressions). I then apply the recursive algorithm described above to the input text.
public static SExp Parse(string input)
{
var tokens = input
.Replace("(", " ( ")
.Replace(")", " ) ")
.Split().Where(x => x != "").ToList();
return Next(tokens);
}
private static SExp Next(List<string> tokens)
{
var token = tokens.Pop(); // removes first element
if (token == ")") throw new Exception("Unexpected '(' character");
if (token == "(")
{
var list = new SExpList();
while (tokens.First() != ")")
list.Contents.Add(Next(tokens));
tokens.Pop();
return list;
}
return new SExpAtom(token);
}
AST
The preceding step produced a structure that is far easier to process than the input text. With a valid SExp
in hand, we can proceed to the next step, translating the SExp
tree to an abstract syntax tree.
The base type for all syntax classes is Expression
.
abstract class Expression
{
public abstract Value Evaluate(Dictionary<string, Expression> env)
}
This base class requires that subclasses provide an implementation for the Evaluate
method. Evaluate
produces a Value
, a derived type of Expression
that evaluates to itself.
abstract class Value : Expression
{
public Value(object underlying)
{
Underlying = underlying;
}
public T GetValue<T>()
{
return (T)Underlying;
}
public override Value Evaluate(Dictionary<string, Expression> env)
{
return this;
}
protected readonly object Underlying;
}
Don’t worry about the argument of Evaluate
. We’ll discuss that and the overrides of Evaluate
in detail later. I’ll further show how we can improve the design by hiding env
from direct exposure to the syntax classes. We’ll also make Value
a generic class, to avoid having to cast Underlying
.
Our interpreter will support two Value
types: Bool(bool b)
and Number(double x)
. The other expressions are as follows
-
Call(Expression body, List<Expression> arguments)
represents a function call e.g.(myfunc 1 2 3)
-
Define(string name, Expression expression)
represents a variable definition e.g.(define name e1)
-
If(Expression test, Expression trueBranch, Expression elseBranch)
represents conditional execution e.g.(if (e1) e2 e3)
-
Lambda(Expression body, List<string> parameters)
represents an anonymous function e.g.(lambda (n) (* 2 n))
-
ListFunction(string op, List<Expression> arguments)
represents a built-in operation on a list of arguments e.g.(+ 1 2 3)
-
Variable(string name)
represents a variable reference to lookup theExpression
stored atname
e.g.x
Given an SExp
, or goal will be to turn the SExp
into one of the above types. If we’re given an SExpAtom
, we need to parse the string token to a Value
and return the result.
private static Expression From(SExpAtom atom)
{
double x;
if (double.TryParse(atom.Value, out x))
return new Number(x);
if (atom.Value == "#t")
return new Bool(true);
if (atom.Value == "#f")
return new Bool(false);
return new Variable(atom.Value);
}
If we’re given an SExpList
, we need to examine the first element of the list. If it’s an SExpAtom
, its string token should parse to one of the syntax keywords, (for example if
or lambda
) which will tell us how to proceed parsing the rest of the contents of the list. If it doesn’t, we’ll return a Call
as the string should be a variable reference.
private static Expression From(SExpList root)
{
// "Require" throws an exception if the test condition is not satisfied
var head = root.Contents.Pop();
if (head is SExpList)
return new Call(From(head), root.Contents.Select(x => From(x)).ToList());
var cast = head as SExpAtom;
switch (cast.Value)
{
case "+":
// ... similar for -, *, /, =, etc
return new ListFunction(cast.Value, root.Contents.Select(x => From(x)));
case "lambda":
Require(root.Contents.First() is SExpList,
"\"lambda\" statements should be followed by list of parameters");
var parameters = (SExpList)root.Contents.Pop();
Require(parameters.Contents.All(x => x is SExpAtom),
"\"lambda\" statements should be followed by list of string parameters");
var body = From(root.Contents.Pop());
return new Lambda(body, parameters.Contents.Select(x => (x as SExpAtom).Value));
case "if":
Require(root.Contents.Count == 3,
"\"if\" statements should be followed by three expressions");
return new If(From(root.Contents.Pop()),
From(root.Contents.Pop()), From(root.Contents.Pop()));
case "define":
Require(root.Contents.First() is SExpAtom,
"\"define\" statements should be followed by string identifier");
var atom = (SExpAtom)root.Contents.Pop();
return new Define(atom.Value, From(root.Contents.Pop()));
default:
return new Call(From(cast), root.Contents.Select(x => From(x)).ToList());
}
}
One final remark: if the first element is not an SExpAtom
, it’s an SExpList
and must also be parsed to a Call
. In this case, the function call must return a function that will be applied to the rest of the arguments in the list. For example, the inner expression of ((myfunc 1) 2)
returns a function that is then called on 2
.
Evaluation
Great! We have an AST that can be evaluated!
I mentioned earlier that we’d save discussion of the env
argument in the Evaluate
method for later. This is simply the evaluation environment, the place Define
expressions put their definitions. You can think of this dictionary as the “context” that an Expression
in the tree has access to during evaluation.
Evaluation is pretty straightforward in most cases. If an Expression
contains sub-expressions, evaluate those to Value
types first, and then perform the required operations to coalesce the sub-expressions to a Value
. We’d do this for evaluating a ListFunction
, for example.
// Evaluate method for "ListFunction"
public override Value Evaluate(Dictionary<string, Expression> env)
{
switch (Op)
{
case "+":
var nums = Arguments.Select(x =>
x.Evaluate(env).GetValue<double>());
return new Number(nums.Sum());
// cases for other Number/Bool operations
}
throw new NotImplementedException();
}
The final code is a little cleaner. I put the operation methods in an extension class and have a separate function to evaluate Arguments
down to a Number
. At least you have the idea for now.
If
statements evaluate Test
to a Bool
and branch as expected. Define
statements put their definition in the current environment, and return null. Variable
expressions conversely return the Expression
located at Name
.
// Evaluate method for "If"
public override Value Evaluate(Dictionary<string, Expression> env)
{
return Test.Evaluate(env).GetValue<bool>() ?
TrueBranch.Evaluate(env) : ElseBranch.Evaluate(env);
}
// Evaluate method for "Define"
public override Value Evaluate(Dictionary<string, Expression> env)
{
env[Name] = Expression;
return null;
}
// Evaluate method for "Variable"
public override Value Evaluate(Dictionary<string, Expression> env)
{
return env[Name].Evaluate(env);
}
It’s easy enough to check how we’re doing.
public static void Main(string[] args)
{
var env = new Dictionary<string, Expression>();
var expression = Parser.Parse("(+ 2 (+ 1 1))");
double val = expression.Evaluate(env).GetValue<double>();
expression = Parser.Parse("(define x #t)");
expression.Evaluate(env);
expression = Parser.Parse("(if x (+ 1 1) 0)");
val = expression.Evaluate(env).GetValue<double>();
}
The most complicated (and interesting!) evaluation case is that of functions and function calls. Lambda
expressions evaluate to Closure
expressions, a Value
type we haven’t discussed yet. A Closure
simply consists of a Lambda
and a copy of the environment in which the function was defined.
If you think about it, this is no different than the type of nesting that needs to happen whenever we evaluate functions in any other language. Moreover, functions have their own local environment that exists separate from whatever top-level environment in which the function is called. If we didn’t support this type, we’d have no way to store the variables that get defined in a function. A variable n
that is defined at the global level relative to the function definition and defined again inside a function should leave the outer n
untouched.
Our parsing algorithm actually introduced the idea of closures earlier when I discussed the procedure for parsing an SExpList
where the first element is not an SExpAtom
. In the example ((myfunc 1) 2)
the expression (myfunc 1)
must return a function to apply to 2
. But really we must return a function with the context of everything defined inside myfunc
as any definitions should be available when we apply the returned function on 2
. This is exactly what a Closure
is; a function and an environment.
Evaluation of a Call
thus evaluates its Body
to a Closure
, and extends the environment of the Closure
with the evaluation-time arguments of the Call
. The body of the Lambda
in the Closure
is then evaluated in the environment of the Closure
.
// Evaluate method for "Lambda"
public override Value Evaluate(Dictionary<string, Expression> env)
{
return new Closure(this, env);
}
// Evaluate method for "Call"
public override Value Evaluate(Dictionary<string, Expression> env)
{
var closure = Body.Evaluate(env).GetValue<Closure>();
var runtime = Arguments.Select(x => x.Evaluate(env));
foreach (var pair in closure.Lambda
.Parameters.Zip(runtime, (p, arg) => new {p, arg}))
{
closure.Environment[pair.p] = pair.arg;
}
return closure.Lambda.Body.Evaluate(closure.Environment);
}
That’s it! A little complicated, yes, although there’s not a lot of code needed to implement correct evaluation of Call
expressions. Closures
seem conceptually different from any of the other Expressions we’ve discussed. But really they’re just a Value
type; they’re the Value
type that functions must evaluate to in order for the function’s body to be properly scoped.
It doesn’t hurt to throw in a few tests. Looks like I delivered on that promise at the start of this post!
public static void Main(string[] args)
{
expression = Parser.Parse(
@"(define fact (lambda (n) (if (< n 1) 1 (* n (fact (- n 1))))))");
expression.Evaluate(env);
expression = Parser.Parse("(fact 5)");
val = expression.Evaluate(env).GetValue<double>();
expression = Parser.Parse(
@"(define fib (lambda (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2))))))");
expression.Evaluate(env);
expression = Parser.Parse("(fib 10)");
val = expression.Evaluate(env).GetValue<double>();
}
Improving our design
Our current design requires each syntax class know how to evaluate itself. While this approach is fairly OOP, it’s not particularly scalable. If we wanted to add any other set of operations, for example type inference or pretty printing, we’d have to change the definition of Expression
and re-compile/retest all the AST classes. Our current code also exposes the evaluation environment to the syntax classes directly. If you think about it, the AST classes should really just hold expressions, and do nothing else. They don’t need to know anything about a environments and shouldn’t be able to modify the evaluation context at will.
What I’m getting at is that it would be better if we put the evaluation algorithm in a separate function, and not have it distributed across the AST classes. In a functional programming language, the approach would be to switch on the Expression
types using pattern matching. We can do that in C# by using the is
keyword and casting to the correct type (not ideal).
Fortunately there’s a design pattern that addresses this problem exactly. It’s called the visitor pattern. There are visitors, which implement a Visit
method for all types in the tree, and visitables, classes that can accept an implementation of the visitor interface through implementations of an Accept
method. When a visitable accepts a visitor, all it does is call the visitor’s Visit
implementation for the visitable’s type. It’s now the visitor’s responsibility to choose what to do next.
Refactored implementation of the evaluation algorithm
The first step in refactoring our code is to provide an interface for all operations on the tree.
interface IExpressionVisitor
{
void Visit(Call exp);
void Visit(Define exp);
void Visit(If exp);
void Visit(Lambda exp);
void Visit(ListFunction exp);
void Visit(Variable exp);
void Visit(Bool exp);
void Visit(Closure exp);
void Visit(Number exp);
void Visit(Literal exp);
}
This of course makes perfect sense; we want to visit every variant of the Expression
type. Our base type correspondingly has to change to accept an IExpressionVisitor
. Easy enough.
abstract class Expression
{
public abstract void Accept(IExpressionVisitor v);
}
Finally, when an Expression
accepts a visitor, all it needs to do is tell the visitor to visit it. The type does this by calling Visit
on itself.
public override void Accept(IExpressionVisitor v)
{
v.Visit(this);
}
The above code shows how the visitable tells the visitor to “visit me.” This design pattern is essentially a switch statement on the types of the visitables. It’s like we’re using the is
keyword to determine how to proceed when given an Expression
. The nice thing about the design pattern, however, is that we don’t then need to cast to the target type. As an example, here’s how we would visit a Call
expression.
// "Call" just accepted me; here's how I visit a "Call"
public void Visit(Call exp)
{
exp.Body.Accept(this);
var closure = (Closure)evaled.Pop(); // evaluation stack
var runtime = Eval<Expression>(exp.Arguments);
var old = env;
env = closure.Environment;
foreach (var pair in closure.Lambda
.Parameters.Zip(runtime, (p, arg) => new { p, arg }))
{
env[pair.x] = pair.y;
}
closure.Lambda.Body.Accept(this);
env = old;
}
There’s one small caveat. Since our syntax classes can no longer recursively evaluate inner expressions, the implementation of the evaluation visitor must store results on a stack. This takes a little getting use to, but we can more or less factor out the overrides of the Evaluate
method and rename them Visit
. The full code for the visitor class is a little too long to post as a snippet; please check here for the complete implementation.
The huge bonus of using this design pattern is that we gain interchangeability. Look at the return types for Visit
and Accept
. They’re void
! The Accept
method is just there to notify the visitor that its type should be visited next. Because the methods are void
, the AST classes can accept any implementation of IExpressionVisitor
; it’s up to the visitor to track the “result state” it wants to expose to clients. In our evaluation visitor, we wanted to visit the tree to a Value
. For a pretty print visitor, we may want to store the result of visiting each node as a string, and expose that to a client for printing. We can leave the syntax classes untouched, no matter what set of new operations we want to define on the tree.
Final thoughts
Oddly enough, the first implementation of Evaluate
is sometimes called the interpreter pattern. To me, interchangeability is what makes the visitor pattern attractive for writing an interpreter: we can provide as many tree traversal algorithms as we want without ever having to change the code in the AST classes. It’s a much more scalable approach and more naturally allows more than one developer to work on the project.
If you’d like to learn more about programming languages, I highly recommend the programming languages course offered online by Coursera. This blog post by Peter Norvig is also a great resource.
Whew – I hope you enjoyed reading! If you have any thoughts, questions, or think that I’ve made a mistake, please leave a comment below. Thanks!
-
Switch statements in C/C++ are often rewritten to use binary search ↩