gjdanis    About    Archive    Feed

Writing a simple LISP interpreter in C#

A few months ago, I decided to develop an interpreter for a subset of the LISP programming language. This was a great exercise, and I’d like to share a few things I learned along the way.

Full code and where we’re going

By the end of this blog you’ll know how to implement a simple interpreter. And your code will be able to interpret code like this.

> (define fib (lambda (n) ;; Fibonacci!
    (if (< n 2) 
      1 
  (+ (fib (- n 1)) (fib (- n 2))))))

> (fib 10)
55

For reference, the final project can be found here.

Background

An interpreter is a program that reads raw text in one language and translates that text into something executable in another language. It does this by constructing an abstract syntax tree (or “AST” for short), an object representation that captures the execution semantics of the raw text. The tree is then traversed by an evaluation algorithm and the result is returned.

This is different from, although similar to, the process used by a compiler to execute code. A compiler, however, will typically try to optimize your code.1 Because of this optimization step, a representation of the compiled code needs to be saved to the disk for later use. In contrast, interpreters execute the input text in memory.

It should be noted that compilers and interpreters are not a feature of the language but rather a feature of the language implementation. One could, for example, implement the C programming language with an interpreter, and the Python programming language with a compiler. The choice of whether or not to implement a language with an interpreter or a compiler is strictly up to the developer.

Parsing

Unlike infix notation (operators appear in-between operands), LISP is a relatively easy language to parse. Since everything is fully parenthesized, we don’t need to pay special attention to operator precedence and the order of operations while parsing. Rather, we’re more or less given the AST and it’s up to us to translate the tree into a set of syntax classes by looking at the first element in each parenthesized expression.2

When parsing the text, we have two types to consider; atomics, (indivisible values like numbers and bools), and lists which contain multiple expressions. If we see an atomic, there’s nothing more to do — we simply return an object that holds the atomic data (the raw string).

If we see an opening parenthesis, we know that we’re populating a list. We should continue parsing from the next token in the input stream, and stop once we see a closing parenthesis. At that point, we can return the list.

In my implementation, I defined two types, SExpAtom and SExpList, both of which derive from SExp, an abstract class that is used to group the two child classes (more formally, the LISP syntax uses s-expressions). I then apply the recursive algorithm described above to the input text.

public static SExp Parse(string input)
{
    var tokens = input
        .Replace("(", " ( ")
        .Replace(")", " ) ")
        .Split().Where(x => x != "").ToList();

    return Next(tokens);
}

private static SExp Next(List<string> tokens)
{
    var token = tokens.Pop(); // removes first element
    if (token == ")") throw new Exception("Unexpected '(' character");

    if (token == "(")
    {
        var list = new SExpList();
        while (tokens.First() != ")")
            list.Contents.Add(Next(tokens));
        tokens.Pop();
        return list;
    }

    return new SExpAtom(token);
}

AST

The preceding step produced a structure that is far easier to process than the input text. With a valid SExp in hand, we can proceed to the next step, translating the SExp tree to an abstract syntax tree.

The base type for all syntax classes is Expression.

abstract class Expression
{
    public abstract Value Evaluate(Dictionary<string, Expression> env)
}

This base class requires that subclasses provide an implementation for the Evaluate method. Evaluate produces a Value, a derived type of Expression that evaluates to itself.

abstract class Value : Expression 
{
    public Value(object underlying)
    {
        Underlying = underlying;
    }

    public T GetValue<T>()
    {
        return (T)Underlying;
    }

    public override Value Evaluate(Dictionary<string, Expression> env)
    {
        return this;
    }

    protected readonly object Underlying; 
}

Don’t worry about the argument of Evaluate. We’ll discuss that and the overrides of Evaluate in detail later. I’ll further show how we can improve the design by hiding env from direct exposure to the syntax classes. We’ll also make Value a generic class, to avoid having to cast Underlying.

Our interpreter will support two Value types: Bool(bool b) and Number(double x). The other expressions are as follows

  • Call(Expression body, List<Expression> arguments) represents a function call e.g. (myfunc 1 2 3)

  • Define(string name, Expression expression) represents a variable definition e.g. (define name e1)

  • If(Expression test, Expression trueBranch, Expression elseBranch) represents conditional execution e.g. (if (e1) e2 e3)

  • Lambda(Expression body, List<string> parameters) represents an anonymous function e.g. (lambda (n) (* 2 n))

  • ListFunction(string op, List<Expression> arguments) represents a built-in operation on a list of arguments e.g. (+ 1 2 3)

  • Variable(string name) represents a variable reference to lookup the Expression stored at name e.g. x

Given an SExp, or goal will be to turn the SExp into one of the above types. If we’re given an SExpAtom, we need to parse the string token to a Value and return the result.

private static Expression From(SExpAtom atom)
{
    double x;
    if (double.TryParse(atom.Value, out x))
        return new Number(x);
    if (atom.Value == "#t")
        return new Bool(true);
    if (atom.Value == "#f")
        return new Bool(false);
    return new Variable(atom.Value);
}

If we’re given an SExpList, we need to examine the first element of the list. If it’s an SExpAtom, its string token should parse to one of the syntax keywords, (for example if or lambda) which will tell us how to proceed parsing the rest of the contents of the list. If it doesn’t, we’ll return a Call as the string should be a variable reference.

private static Expression From(SExpList root)
{
    // "Require" throws an exception if the test condition is not satisfied

    var head = root.Contents.Pop(); 
    if (head is SExpList)
        return new Call(From(head), root.Contents.Select(x => From(x)).ToList());

    var cast = head as SExpAtom;

    switch (cast.Value)
    {
        case "+":
        // ... similar for -, *, /, =, etc
            return new ListFunction(cast.Value, root.Contents.Select(x => From(x)));

        case "lambda":
            Require(root.Contents.First() is SExpList,
                "\"lambda\" statements should be followed by list of parameters");

            var parameters = (SExpList)root.Contents.Pop();
            Require(parameters.Contents.All(x => x is SExpAtom),
                "\"lambda\" statements should be followed by list of string parameters");

            var body = From(root.Contents.Pop());
            return new Lambda(body, parameters.Contents.Select(x => (x as SExpAtom).Value));

        case "if":
            Require(root.Contents.Count == 3,
                "\"if\" statements should be followed by three expressions");
            return new If(From(root.Contents.Pop()),
                From(root.Contents.Pop()), From(root.Contents.Pop()));

        case "define":
            Require(root.Contents.First() is SExpAtom,
                "\"define\" statements should be followed by string identifier");
            var atom = (SExpAtom)root.Contents.Pop();
            return new Define(atom.Value, From(root.Contents.Pop()));

        default:
            return new Call(From(cast), root.Contents.Select(x => From(x)).ToList());
    }
}

One final remark: if the first element is not an SExpAtom, it’s an SExpList and must also be parsed to a Call. In this case, the function call must return a function that will be applied to the rest of the arguments in the list. For example, the inner expression of ((myfunc 1) 2) returns a function that is then called on 2.

Evaluation

Great! We have an AST that can be evaluated!

I mentioned earlier that we’d save discussion of the env argument in the Evaluate method for later. This is simply the evaluation environment, the place Defineexpressions put their definitions. You can think of this dictionary as the “context” that an Expression in the tree has access to during evaluation.

Evaluation is pretty straightforward in most cases. If an Expression contains sub-expressions, evaluate those to Value types first, and then perform the required operations to coalesce the sub-expressions to a Value. We’d do this for evaluating a ListFunction, for example.

// Evaluate method for "ListFunction"
public override Value Evaluate(Dictionary<string, Expression> env)
{
    switch (Op)
    {
        case "+":
            var nums = Arguments.Select(x => 
                x.Evaluate(env).GetValue<double>());
            return new Number(nums.Sum());

        // cases for other Number/Bool operations
    }
    throw new NotImplementedException();
}

The final code is a little cleaner. I put the operation methods in an extension class and have a separate function to evaluate Arguments down to a Number. At least you have the idea for now.

If statements evaluate Test to a Bool and branch as expected. Define statements put their definition in the current environment, and return null. Variable expressions conversely return the Expression located at Name.

// Evaluate method for "If"
public override Value Evaluate(Dictionary<string, Expression> env)
{
    return Test.Evaluate(env).GetValue<bool>() ? 
        TrueBranch.Evaluate(env) : ElseBranch.Evaluate(env);
}

// Evaluate method for "Define"
public override Value Evaluate(Dictionary<string, Expression> env)
{
    env[Name] = Expression;
    return null;
}

// Evaluate method for "Variable"
public override Value Evaluate(Dictionary<string, Expression> env)
{
    return env[Name].Evaluate(env);
}

It’s easy enough to check how we’re doing.

public static void Main(string[] args)
{
    var env = new Dictionary<string, Expression>();
    var expression = Parser.Parse("(+ 2 (+ 1 1))");
    double val = expression.Evaluate(env).GetValue<double>();

    expression = Parser.Parse("(define x #t)");
    expression.Evaluate(env);

    expression = Parser.Parse("(if x (+ 1 1) 0)");
    val = expression.Evaluate(env).GetValue<double>();
}

The most complicated (and interesting!) evaluation case is that of functions and function calls. Lambda expressions evaluate to Closure expressions, a Value type we haven’t discussed yet. A Closure simply consists of a Lambda and a copy of the environment in which the function was defined.

If you think about it, this is no different than the type of nesting that needs to happen whenever we evaluate functions in any other language. Moreover, functions have their own local environment that exists separate from whatever top-level environment in which the function is called. If we didn’t support this type, we’d have no way to store the variables that get defined in a function. A variable n that is defined at the global level relative to the function definition and defined again inside a function should leave the outer n untouched.

Our parsing algorithm actually introduced the idea of closures earlier when I discussed the procedure for parsing an SExpList where the first element is not an SExpAtom. In the example ((myfunc 1) 2) the expression (myfunc 1) must return a function to apply to 2. But really we must return a function with the context of everything defined inside myfunc as any definitions should be available when we apply the returned function on 2. This is exactly what a Closure is; a function and an environment.

Evaluation of a Call thus evaluates its Body to a Closure, and extends the environment of the Closure with the evaluation-time arguments of the Call. The body of the Lambda in the Closureis then evaluated in the environment of the Closure.

// Evaluate method for "Lambda" 
public override Value Evaluate(Dictionary<string, Expression> env)
{
    return new Closure(this, env);
}

// Evaluate method for "Call"
public override Value Evaluate(Dictionary<string, Expression> env)
{
    var closure = Body.Evaluate(env).GetValue<Closure>();
    var runtime = Arguments.Select(x => x.Evaluate(env));

    foreach (var pair in closure.Lambda
        .Parameters.Zip(runtime, (p, arg) => new {p, arg}))
    {
        closure.Environment[pair.p] = pair.arg;
    }
    return closure.Lambda.Body.Evaluate(closure.Environment);
}

That’s it! A little complicated, yes, although there’s not a lot of code needed to implement correct evaluation of Call expressions. Closures seem conceptually different from any of the other Expressions we’ve discussed. But really they’re just a Value type; they’re the Value type that functions must evaluate to in order for the function’s body to be properly scoped.

It doesn’t hurt to throw in a few tests. Looks like I delivered on that promise at the start of this post!

public static void Main(string[] args)
{
    expression = Parser.Parse(
        @"(define fact (lambda (n) (if (< n 1) 1 (* n (fact (- n 1))))))");
    expression.Evaluate(env);

    expression = Parser.Parse("(fact 5)");
    val = expression.Evaluate(env).GetValue<double>();

    expression = Parser.Parse(
        @"(define fib (lambda (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2))))))");
    expression.Evaluate(env);

    expression = Parser.Parse("(fib 10)");
    val = expression.Evaluate(env).GetValue<double>();
}

Improving our design

Our current design requires each syntax class know how to evaluate itself. While this approach is fairly OOP, it’s not particularly scalable. If we wanted to add any other set of operations, for example type inference or pretty printing, we’d have to change the definition of Expression and re-compile/retest all the AST classes. Our current code also exposes the evaluation environment to the syntax classes directly. If you think about it, the AST classes should really just hold expressions, and do nothing else. They don’t need to know anything about a environments and shouldn’t be able to modify the evaluation context at will.

What I’m getting at is that it would be better if we put the evaluation algorithm in a separate function, and not have it distributed across the AST classes. In a functional programming language, the approach would be to switch on the Expression types using pattern matching. We can do that in C# by using the is keyword and casting to the correct type (not ideal).

Fortunately there’s a design pattern that addresses this problem exactly. It’s called the visitor pattern. There are visitors, which implement a Visit method for all types in the tree, and visitables, classes that can accept an implementation of the visitor interface through implementations of an Accept method. When a visitable accepts a visitor, all it does is call the visitor’s Visit implementation for the visitable’s type. It’s now the visitor’s responsibility to choose what to do next.

Refactored implementation of the evaluation algorithm

The first step in refactoring our code is to provide an interface for all operations on the tree.

interface IExpressionVisitor
{
    void Visit(Call exp);
    void Visit(Define exp);
    void Visit(If exp);
    void Visit(Lambda exp);
    void Visit(ListFunction exp);
    void Visit(Variable exp);
    void Visit(Bool exp);
    void Visit(Closure exp);
    void Visit(Number exp);
    void Visit(Literal exp);
}

This of course makes perfect sense; we want to visit every variant of the Expression type. Our base type correspondingly has to change to accept an IExpressionVisitor. Easy enough.

abstract class Expression
{
    public abstract void Accept(IExpressionVisitor v);
}

Finally, when an Expression accepts a visitor, all it needs to do is tell the visitor to visit it. The type does this by calling Visit on itself.

public override void Accept(IExpressionVisitor v)
{
    v.Visit(this);
}

The above code shows how the visitable tells the visitor to “visit me.” This design pattern is essentially a switch statement on the types of the visitables. It’s like we’re using the is keyword to determine how to proceed when given an Expression. The nice thing about the design pattern, however, is that we don’t then need to cast to the target type. As an example, here’s how we would visit a Call expression.

// "Call" just accepted me; here's how I visit a "Call"
public void Visit(Call exp)
{
    exp.Body.Accept(this);
    var closure = (Closure)evaled.Pop(); // evaluation stack
    var runtime = Eval<Expression>(exp.Arguments);

    var old = env;
    env = closure.Environment;
    foreach (var pair in closure.Lambda
        .Parameters.Zip(runtime, (p, arg) => new { p, arg }))
    {
        env[pair.x] = pair.y;
    }

    closure.Lambda.Body.Accept(this);
    env = old;
}

There’s one small caveat. Since our syntax classes can no longer recursively evaluate inner expressions, the implementation of the evaluation visitor must store results on a stack. This takes a little getting use to, but we can more or less factor out the overrides of the Evaluate method and rename them Visit. The full code for the visitor class is a little too long to post as a snippet; please check here for the complete implementation.

The huge bonus of using this design pattern is that we gain interchangeability. Look at the return types for Visit and Accept. They’re void! The Accept method is just there to notify the visitor that its type should be visited next. Because the methods are void, the AST classes can accept any implementation of IExpressionVisitor; it’s up to the visitor to track the “result state” it wants to expose to clients. In our evaluation visitor, we wanted to visit the tree to a Value. For a pretty print visitor, we may want to store the result of visiting each node as a string, and expose that to a client for printing. We can leave the syntax classes untouched, no matter what set of new operations we want to define on the tree.

Final thoughts

Oddly enough, the first implementation of Evaluate is sometimes called the interpreter pattern. To me, interchangeability is what makes the visitor pattern attractive for writing an interpreter: we can provide as many tree traversal algorithms as we want without ever having to change the code in the AST classes. It’s a much more scalable approach and more naturally allows more than one developer to work on the project.

If you’d like to learn more about programming languages, I highly recommend the programming languages course offered online by Coursera. This blog post by Peter Norvig is also a great resource.

Whew – I hope you enjoyed reading! If you have any thoughts, questions, or think that I’ve made a mistake, please leave a comment below. Thanks!

  1. Switch statements in C/C++ are often rewritten to use binary search

  2. More on the virtues of this syntax