Creating a spec for destructuring

clojure.png

A while back David Nolen had a thoughtful post about using spec as a tool for thought, which included an exploration of creating a spec for clojure.core/let.

The latest Clojure alpha actually includes a spec for let that covers destructuring and I thought it might be interesting to walk through the details of how it is implemented.

I'll pick up approximately where David left off. A typical let looks like this:

(let [a 1
      b 2]
  (+ a b))

We can define an initial spec for clojure.core/let by splitting it into bindings and body:

(require '[clojure.spec :as s]
         '[clojure.spec.gen :as gen])

(s/fdef let
  :args (s/cat :bindings ::bindings
               :body (s/* any?)))

We then need to more fully define bindings as a vector of individual bindings. Each binding is made of a binding-form and an init-expr that computes the value of the local binding:

(s/def ::bindings (s/and vector? (s/* ::binding)))
(s/def ::binding (s/cat :binding ::binding-form 
                        :init-expr any?))

The expressions can be anything so we leave those as any?. The binding-form is where things get interesting. Let's first allow for binding-form to be just simple (no namespace) symbols. That's enough to create something to work with.

;; WORK IN PROGRESS
(s/def ::binding-form simple-symbol?)

Now that we have a full spec, we can actually try a few things. Let's try an example of conforming our bindings.

(s/conform ::bindings '[a 1, b 2])
;;=> [{:binding a, :init-expr 1} {:binding b, :init-expr 2}]

Looks good! We get back a vector of binding maps broken into the binding and the initial expression.

Now we need to expand our spec to include sequential destructuring and map destructuring.

Sequential destructuring

Sequential destructuring binds a series of symbols to the corresponding elements in the bound value. Optionally, the symbols may be followed by a variadic argument (using &) and/or an alias for the overall sequence (using :as).

Some examples:

;; Sequential destructuring examples:
[a b]
[a b :as s]
[a b & r]

To describe a sequential spec we use the spec regex operators:

;; WORK IN PROGRESS
(s/def ::seq-binding-form
  (s/cat :elems (s/* simple-symbol?)
         :rest  (s/? (s/cat :amp #{'&} :form simple-symbol?))
         :as    (s/? (s/cat :as #{:as} :sym simple-symbol?))))

Let's try it out:

(s/conform ::seq-binding-form '[a b])
;;=> {:elems [a b]}
(s/conform ::seq-binding-form '[a b :as s])
;;=> {:elems [a b], :as {:as :as, :sym s}}
(s/conform ::seq-binding-form '[a b & r])
;;=> {:elems [a b & r]}

Hang on a sec, what happened in the last example? The elems snagged & r as well because & is a symbol. We need to redefine our notion of what a binding symbol is to exclude the symbol &, which is special in the language of destructuring:

;; WORK IN PROGRESS
(s/def ::local-name (s/and simple-symbol? #(not= '& %)))
(s/def ::seq-binding-form
  (s/cat :elems (s/* ::local-name)
         :rest  (s/? (s/cat :amp #{'&} :form ::local-name))
         :as    (s/? (s/cat :as #{:as} :sym ::local-name))))

(s/conform ::seq-binding-form '[a b & r :as s])
;;=> {:elems [a b], :rest {:amp &, :form r}, :as {:as :as, :sym s}}

That's better. But it turns out I've not really been spec'ing the full truth of sequential destructuring. Each of the ::elems can itself be sequentially destructured, and even the rest arg can be destructured.

We need to back up to the beginning and reconsider the definition of ::binding-form to add the possibility of either a ::local-name (our improved simple symbol) or a sequential destructuring form. (We'll add map later.)

(s/def ::local-name (s/and simple-symbol? #(not= '& %)))

;; WORK IN PROGRESS (still missing ::map-binding-form)
(s/def ::binding-form
  (s/or :sym ::local-name
        :seq ::seq-binding-form))

(s/def ::seq-binding-form
  (s/cat :elems (s/* ::binding-form)
         :rest  (s/? (s/cat :amp #{'&} :form ::binding-form))
         :as    (s/? (s/cat :as #{:as} :sym ::local-name))))

Now ::binding-form is a recursive specification. Binding-forms are either symbols or sequential forms, which may themselves contain binding-forms. The registry provides naming indirection which makes this possible.

Let's try our prior example again and see things have changed.

(s/conform ::seq-binding-form '[a b & r :as s])
;;=> {:elems [[:sym a] [:sym b]], :rest {:amp &, :form [:sym r]}, :as {:as :as, :sym s}}

Our conformed result is a bit more verbose as it now indicates for each binding form what kind of binding it is. While this is more verbose to read, it's also easier to process. Here's how a recursive binding form example looks:

(s/conform ::seq-binding-form '[a [b & c] [d :as e]])
;;=> {:elems [[:sym a]
;;            [:seq {:elems [[:sym b]], :rest {:amp &, :form [:sym c]}}]
;;            [:seq {:elems [[:sym d]], :as {:as :as, :sym e}}]]}

Finally we are ready to look at map destructuring.

Map destructuring

Map destructuring has a number of entry forms that can be used interchangeably:

  • <binding-form> key - for binding either a local name with (get m key) or recursively destructuring
  • :keys [key ...] - for binding locals with the same name as each key to the value retrieved from the map using the key as a keyword. In addition the specified keys can be either symbols or keywords and simple or qualified. In all cases, the local that gets bound is a short symbol and the value is looked up as a keyword.
  • :<ns>/keys [key ...] - same as :keys, but where ns is used as the namespace for every key
  • :syms [sym ...] - for binding locals with the same name as each sym to the value retrieved from the map using sym, which may be either simple or qualified.
  • :<ns>/syms [sym ...] - same as :syms, but where ns is used as the namespace for every symbol.
  • :strs [str ...] - for binding locals with the same name as each sym to the value retrieved from the map using str as a sym, which must be simple.
  • :or {sym expr} - for providing default values for any missing local that would have been bound based on other entries. The keys should always be simple symbols (the same as the bound locals) and the exprs are any expression.
  • :as sym - binds the entire map to a local named sym.

There is a lot of functionality packed into map binding forms and in fact there are really three different map specs combined into this single map. We call this a "hybrid" map spec.

The first part describes just the fixed well-known attributes in a typical s/keys spec:

(s/def ::keys (s/coll-of ident? :kind vector?))
(s/def ::syms (s/coll-of symbol? :kind vector?))
(s/def ::strs (s/coll-of simple-symbol? :kind vector?))
(s/def ::or (s/map-of simple-symbol? any?))
(s/def ::as ::local-name)

(s/def ::map-special-binding
  (s/keys :opt-un [::as ::or ::keys ::syms ::strs]))

The second part describes the basic binding form specs (examples like {n :name} ), although the left hand side here can further destructure.

(s/def ::map-binding (s/tuple ::binding-form any?))

And finally we need to handle the new functionality for namespaced key or symbol sets (like :<ns>/keys or :<ns>/syms) which we'll describe here as a map entry tuple:

(s/def ::ns-keys
  (s/tuple
    (s/and qualified-keyword? #(-> % name #{"keys" "syms"}))
    (s/coll-of simple-symbol? :kind vector?)))

Then we can put all of these together into the ::map-binding-form by combining them as an s/merge or the well-known attributes and a description of the possible tuple forms:

;; collection of tuple forms
(s/def ::map-bindings
  (s/every (s/or :mb ::map-binding
                 :nsk ::ns-keys
                 :msb (s/tuple #{:as :or :keys :syms :strs} any?)) :into {}))

(s/def ::map-binding-form (s/merge ::map-bindings ::map-special-binding))

And finally we need to go back and define our parent spec to include map bindings:

(s/def ::binding-form
  (s/or :sym ::local-name
        :seq ::seq-binding-form
        :map ::map-binding-form))

And that's it! Here's an example binding form that shows several features of destructuring:

(s/conform ::binding-form
  '[[x1 y1 :as p1]
    [x2 y2 :as p2]
    {:keys [color weight]
     :or {color :black weight :bold}
     :as opts}])
;;=> [:seq {:elems [[:seq {:elems [[:sym x1] [:sym y1]], :as {:as :as, :sym p1}}]
;;                  [:seq {:elems [[:sym x2] [:sym y2]], :as {:as :as, :sym p2}}]
;;                  [:map {:keys [color weight]
;;                         :or {color :black, weight :bold}
;;                         :as opts}]]}]

Now that we have a spec for destructuring, we can reuse it anywhere destructuring is allowed - in fn, defn, for, etc. We could even leverage it to implement destructuring itself. Rather than recursively parsing the binding form, we could simply conform it to receive a more regular structure described in terms of the parts we've defined in the spec.