Writing a DSL in Clojure

The goal of this article is to explain how we could go about writing our own domain-specific language (DSL) in Clojure. We will use Chipper, a very simple DSL for reasoning about logical gates, to do so.

A domain-specific language is a small language which only cares about problems in a specific domain. It does this by defining primitives and means of combining these primitives in a way that is meaningful in that domain. You can think of it as finding the right nouns and verbs to express what we want to do.

Creating a good DSL is hard and takes time. We have to come up with nouns and verbs that work well together at the right level of abstraction, and we have to find a way to write in our mini-language.

Chipper was my attempt to understand more about the process of designing and making a DSL. As a "real" DSL it leaves a lot to desire, but it's an illustrative example for problems and design patterns that come up in other, more serious DSLs.

Roughly speaking, we can divide the making of a DSL into four parts:

  1. Understanding the domain
  2. Bottom-up representation
  3. Top-down idealization
  4. Bridging the two

We will go through each of these steps one at a time.

Domain: Logical gates

Chipper's domain consists of composing and reasoning about logical gates. A logical gate is a boolean function: it takes a set of ones and zeroes and returns another set of ones and zeroes. For example, an AND gate takes two booleans and returns 1 (true) if both inputs are 1, otherwise it returns 0 (false). We can represent this with a truth table:

Figure 1. Truth tables for some logical gates

NOT GATE          AND GATE           DMUX GATE
========          ========           =========

IN || OUT        A B || OUT        IN SEL || A B
---------        ----------        -------------
0  ||   1        0 0 ||  0          0  0  || 0 0
1  ||   0        0 1 ||  0          0  1  || 0 0
                 1 0 ||  0          1  0  || 1 0
                 1 1 ||  1          1  1  || 0 1

Truth tables allow us to see the behaviour of a gate. Given all the possible inputs, here are the outputs. It turns out that we win a lot by expressing gates in terms of other already defined gates.

A DMUX gate takes a single input IN and selects (SEL) one of the connected lines A or B. Here's how it is implemented in terms of NOT and AND gates.

Figure 2. Inside of a DMUX

IN SEL || NSEL=(NOT SEL) A=(AND IN NSEL) B=(AND IN SEL) || A B
--------------------------------------------------------------
 0   0 ||   1            0               0              || 0 0
 0   1 ||   0            0               0              || 0 0
 1   0 ||   1            1               0              || 1 0
 1   1 ||   0            0               1              || 0 1

We can think of NSEL, A and B as a local scope for a DMUX.

We can express every possible boolean function using only AND and NOT gates, which in itself is remarkable. As it turns out, we don't even need two gates - it's enough with just a NAND gate! A NAND gate is a boolean function that takes two inputs and returns 1 unless both its inputs are 1, essentially a reversed AND. The property of being able to express all boolean functions is called functional completeness. You can convince yourself that it's enough with just a NAND gate by expressing NOT, AND, OR and XOR gates (in that order) using only a NAND gate to begin with. This approach is used in the book The Elements of Computing Systems, where you start off with a NAND gate and essentially build up all the abstractions yourself necessary to make a game of Tetris in a high-level language.

What do we want to express?

Looking at the above truth tables, there are a couple of things that seem to be desirable to have in our mini-language:

  • A logical gate is just a function
  • It takes some inputs and returns some outputs
  • We define logical gates in terms of other logical gates
  • We want to use outputs of one gate as inputs in another

There are many ways of writing a DSL, but one technique that is powerful is to imagine — or write down — the code that is necessary to represent what we want to express, and then try to make the code as isomorphic as possible. By isomorphic we mean that the representation of one gate should be in the same form as the representation of another, regardless of difference in complexity. Doing this will allow us to see patterns and generate code later on. For example we could say that a NOT gate returns 1 if the input is 0 and 0 if the input is 1, but doing so wouldn't allow us to abstract away what is common with say, a DMUX gate.

Figure 3. Representation of logical gates in Clojure

(defn nand* [a b]
  (let [out (if (= 2 (+ a b)) [0] [1])
    out)))

(defn not* [in]
    (let [out (nand* in in)]
      out))

(defn and* [a b]
  (let [[w] (nand* a b)
        out (not* w)]
    out))

(defn dmux* [in sel]
  (let [[nsel] (not* sel)
        [a]    (and* in nsel)
        [b]    (and* in sel)
        out  [a b]]
    out))

We could extend this if we wanted to. For example we could imagine giving the inputs and outputs names, so instead of returning [0 1] we return a map {:foo 0 :bar 1}. These are trade-offs that we make writing a DSL — in the end I decided that the input and output names are arbitrary, as long as we keep track of the order in the vector so that it matches our expectations. Another thing we could specify explicitly is whether the inputs and outputs are single booleans or an array of booleans - this is used to make actual logical chips, for example in the case of a generalized OR gate that operates on 16 bits and returns true if either of those are true.

How do we want to express it?

In an ideal world, how would we like to express the above code? Explaining the problem out loud or writing it out on a piece of paper can help us see what is necessary to include and what would be useful.

Using the previous section as a guideline, here's one way to express what is different about each gate. We imagine a special form defgate which takes a gate name, a vector of inputs, a => to help with the unfamilar syntax, a vector of outputs and a implementation in the form of other gates. Barring the parenthesis and brackets, this looks pretty close to what we would write down on a piece of paper.

(NAND is not expressed in terms of other gates, so we have a special construct for that,open the box to see how that works).

Figure 4. A more natural way to express logical gates

(defgate not* [in] => [out]
  (nand* [in in] => [out]))

(defgate and* [a b] => [out]
  (nand* [a b] => [w])
  (not* [w] => [out]))

(defgate dmux* [in sel] => [a b]
  (not* [sel] => [nsel])
  (and* [in nsel] => [a])
  (and* [in sel] => [b]))

We could imagine other constructs that are more succinct or powerful. For example, we could imagine writing something like this.

Figure 5. Yet another way to express logical gates

(defgate and* a b => out
  nand* a b => w
  not* w => out)

but the only benefit is a slightly shorter syntax at the expense of being harder to parse, and generally less "native" compared to to other Clojure code.

Making a bridge

We can now begin to transform our ideal code into the uniform code we wrote, or just imagined, earlier. Here's where Clojure's code-as-data property makes things easy for us.

Figure 6. Transforming the code

;; from this
(defgate dmux* [in sel] => [a b]
  (not* [sel] => [nsel])
  (and* [in nsel] => [a])
  (and* [in sel] => [b]))

;; into this
  (let [[nsel] (not* sel)
        [a]    (and* in nsel)
        [b]    (and* in sel)
        out  [a b]]
     out))

For code design reasons, we will actually render a slightly different representation of the let expression - it will be recursive. Function-wise it will be the same, expect instead of one scope we will have several nested scopes. Here are the functions and macros that make this transformation possible.

Figure 7. Making a bridge

(defn- arrow?
  "predicate that returns true if x is =>"
  [x]
  (and (symbol? x) (= (name x) "=>")))

(defn- split-body
  "splits expression into two parts, one with input
   pins and the other with => and outputs"
  [body]
  (split-with (complement arrow?) body))

(defn expand-defgate
  "recursively expands a gate with the right scoping"
  [forms outs]
  (if (empty? forms)
      outs
      (let [[[name & body] & rest] forms
            [[args] [_ & [outputs]]] (split-body body)]
        `(let [~(vec outputs) (~name ~@args)]
          ~(expand-defgate rest outs)))))

(defmacro defgate
  "defines a logical gate with a name, input/output
  pins, and a implementation consisting of other gates"
  [gate ins _ outs & forms]
  `(defn ~gate ~ins ~(expand-defgate forms outs)))

(Thanks to Peter Siebel for the recursive let solution).

Depending on how familiar you are with Clojure, this will either delight or scare you.

Techniques used above

  • Recursion: base case and dealing with one form at a time
  • List destructuring: take regular forms apart
  • Reader macros: ` ~ @ to manipulate when things are evaluated

We also want to keep our macro as small as possible, since it's easier to test functions than macros, so we've abstracted away most of the functionality in the expand-defgate function.

Macros especially can be hard to wrap your head around, because they operate in a separate universe from regular functions.

Macros live in the land of the names, not in the land of the things they refer to — Paul Graham

Let's walk through how defgate and expand-defgate works.

Figure 8. Explaining defgate and defmacro

(defgate and* [a b] => [out]
  (nand* [a b] => [w])
  (not* [w] => [out]))

1. defgate
(defn and* [a b]
  ~(expand-defgate ((nand* [a b] => [w])
                    (not* [w] => [out])) ;; forms
                   [out]))               ;; outs

2. expand-defgate
a) base-case: forms is not empty, move on
   and take care of all individual form in order

b) let name=nand*, body=([a b] => [w]),
   rest is the other forms, args=[a b] and outputs=[w]

c) *return a let* where [w]=(nand* a b), inside this let
   call expand-defgate again with ((not*...)) and outs=[out]

That's it!

Summing up

A good DSL lets you focus on the problem at hand, using abstractions that are natural to the problem domain in question.

The most important and often neglected step in writing a DSL is understanding the domain and what we want to express. Without understanding the domain — its primitives and means of combinations — it's hard to write a language that captures everything we want to express in it.

The source code for Chipper is on Github.

Resources

2 responses
Very interesting! But where does this "let-out" pattern come from? As in (defn nand* [a b] (let [out (if (= 2 (+ a b)) [0] [1]) out))) instead of (defn nand* [a b] (if (= 2 (+ a b)) [0] [1]))
Jay: The idea in that section is to make the gates as uniform as possible, so it's easier to see patterns for parsing the gates. That is, what does a nand gate have in common with a dmux gate? ``` (defn nand* [a b] (let [out (if (= 2 (+ a b)) [0] [1]) out))) ``` ``` (defn dmux* [in sel] (let [[nsel] (not* sel) [a] (and* in nsel) [b] (and* in sel) out [a b]] out)) ``` So even though it's not necessary or as straightforward as the shorter version you suggested, it helped me at least to think about the base case in a way that was compatible with more complex gates.