Scheme in Scheme Part 4 – The Virtual Machine

This is Part 4 of Scheme In Scheme.  Part 1, Part 2 or Part 3 are also good reads.

The next task I set out to complete for Scheme in Scheme was implement lambda.  I can’t really say that I got that far.  After wrestling with it for a while I realized that I didn’t have a good foundation and therefore implementing lambda was going to be difficult.  I had a quasi-virtual machine at that point.  I decided it would be better to convert the the current code into a more fully featured virtual machine instead of trying to retrofit lambda in what was there.  This update can really be split into two pieces — the execution of byte code and the compiler.

Byte Code

I’ve invented a byte code format for my project.  The virtual machine will execute the byte code in the order it’s given to it.  Here are some byte code examples:

#;6> (compile-form '(+ 1 2))
((push 1) (push 2) (add))
#;7> (compile-form '(/ 5 x))
((push 5) (get x) (div))
#;8> (compile-form '(+ 1 2 (- 5 3) (* 2 50)))
((push 1) (push 2) (add) (push 5) (push 3) (sub) (add) (push 2) (push 50) (mul) (add))
#;9> (compile-form '(define x 3))
((push 3) (define x))

The function compile-form takes in a list and transforms it into a list of instructions.  The very simple list (+ 1 2) translates into ((push 1) (push 2) (add)).   Since this is a stack machine the add instruction will pop two arguments off the stack, add them, and push them back onto the stack.  All of the arithmetic operations have implicit parameters which are pushed onto the stack. A lot more logic has been transferred into the compilation step for arithmetic.

(define (instruction/arithmetic env op)
 (let* ((num2   (env/pop! env))
        (num1   (env/pop! env))
        (result (case op
                  [(add) (+ num1 num2)]
                  [(sub) (- num1 num2)]
                  [(mul) (* num1 num2)]
                  [(div) (/ num1 num2)])))
  (env/push! env result))
 env)

All the instructions are pretty simple.  Push objects onto the stack, call the instruction, and then pop off the result.  There are only three instructions currently that take an argument: push, get, and define.  The get instruction pulls a value from a binding in the environment.  So if x was previously bound to 3 via (define x 3) then the byte code (get x) would push 3 onto the stack.  The define instruction operates in the same way except it pops the top off the stack and binds that object to it’s argument.  (define x 3) will produce ((push 3) (define x)) which binds x to 3.

Compiling

Here is the bulk of the code for the compiler:

(define (compile-arg arg)
 (if (self-evaluating? arg)
     `((push ,arg))
     (if (symbol? arg)
         `((get ,arg))
         (compile-form arg))))

;; Compiles the pushes for arithemtic args.  This will, at most, push two
;; arguments onto the stack.  If arguments are lacking, depending on the
;; op, this will push an null-operator (0 for add, 1 for mul).
(define (compile-arithmetic op args)
 (case (length args)
   [(0) `((push ,(default-arithmetic-val op))
          (push ,(default-arithmetic-val op))
          (,op))]
   [(1) `(,@(compile-arg (car args))
          (push ,(default-arithmetic-val op))
          (,op))]
   [(2) `(,@(compile-arg (car args))
          ,@(compile-arg (cadr args))
          (,op))]
 [else (append (compile-arithmetic op (take args 2))
               (append-map (lambda (x) `(,@(compile-arg x) (,op)))
                           (drop args 2)))]))

(define (compile-define binding args)
 (if (symbol? binding)
     ; If the binding is a symbol there should only be one other element
     ; in the list (define binding arg)
     `(,@(compile-arg (car args)) (define ,binding))))

;; Takes in a Scheme form for input and returns a list of instructions.
(define (compile-form input)
  (let ((form-name (car input))
        (args      (cdr input)))
    (case form-name
       [(+) (compile-arithmetic 'add args)]
       [(-) (compile-arithmetic 'sub args)]
       [(*) (compile-arithmetic 'mul args)]
       [(/) (compile-arithmetic 'div args)]
       [(define) (compile-define (car args) (cdr args))]
       [else (list (car input))])))

The first thing to look at is compile-form.  This function is the main entry point into the compiler.  It destructures the list by pulling out the operation (form-name in the code) being performed and the arguments passed to the operation.  It then calls into specific compilation routines based on the operation.

Most of the compilation at this point just deals with a lot of list manipulation.  A list goes into to compile-form and a list comes out.  Each element in the output list is an instruction.  Instruction themselves are lists.  Usually the instructions are only one element long but those that take parameters are two elements long.

The most interesting function in the compiler is probably the arithmetic compiler.  If I open up a scheme interpreter and type (+) it gives me 0(+ 1) gives me 1.  I had to special case the zero, one and two argument cases because of this.  All of the arithmetic instructions will pop two objects off the stack and operate on them.  The simple way of handling this is to always generate dummy pushes for missing arguments.  Zero arguments generates two pushes with a default value (for add it’s 0).  One argument pushes a dummy argument and the parameter argument.  Two arguments just push the two parameters onto the stack.

#;2> (compile-form '(+))
((push 0) (push 0) (add))
#;3> (compile-form '(+ 1))
((push 1) (push 0) (add))
#;4> (compile-form '(+ 1 2))
((push 1) (push 2) (add))

When dealing with 3 or more arguments the compiler will create byte code with two pushes, the arithmetic instruction and then interleaving pushes and instructions.

#;1> (compile-form '(- 5 2 3))
((push 5) (push 2) (sub) (push 3) (sub))
#;2> (compile-form '(- 5 2 3 4 1 2 3 5))
((push 5) (push 2) (sub) (push 3) (sub) (push 4) (sub) (push 1) (sub) (push 2) (sub) (push 3) (sub) (push 5) (sub))

The last little bit is the compile-arg function.  It is used to evaluate arguments that are not self-evaluation (simple).  This recursive compilation allows for nested lists.  If I pass the compiler (+ 5 2 (- 3 2)) it will properly generate code to subtract and then add all the numbers up.

#;1> (compile-form '(+ 5 2 (- 3 2)))
((push 5) (push 2) (add) (push 3) (push 2) (sub) (add))
#;2> (repl)
-: (+ 5 2 (- 3 2))
8
-: quit
quit

As always, you can see the code for this part of Scheme In Scheme on my Git Hub account at Part 4.