Separation of Concerns in Datomic Query: Datalog Query and Pull Expressions

One concept that newcomers to Clojure and Datomic hear an awful lot about is homoiconicity: the notion that code is data and data is code. This is one of several simple yet powerful concepts whose applications are so prevalent that it's easy to forget just how powerful they are.

One example of this is the choice of Datalog as Datomic's query language. Datalog queries are expressed as data, not strings, which means we can compose them, validate them, and pass them around much more simply than with strings.

When I first started working with Datomic, I found myself writing queries like:

(d/q '[:find [?lname ?fname]
       :in $ ?ssn
       :where
       [?e :person/ssn ?ssn]
       [?e :person/first-name ?fname]
       [?e :person/last-name ?lname]]
     (d/db conn)
     "123-45-6789")

This returns a result set like this:

["Murray" "William"]

Without seeing the query that generated this result, you might think it's a collection of first names. Even if you understand it to be the first and last name of one person, you might not know that the person's last name is "Murray" and the person's first name is "William," better known as "Bill."

We can clarify the intent by putting the query results in a map:

(->> (d/q '[:find [?lname ?fname]
            :in $ ?ssn
            :where
            [?e :person/ssn ?ssn]
            [?e :person/first-name ?fname]
            [?e :person/last-name ?lname]]
          (d/db conn)
          "123-45-6789")
     (zipmap [:last-name :first-name]))
;; => {:last-name "Murray" :first-name "William"}

That's a nicer outcome, but we'd have some of work to do if we decided to fetch :person/middle-name and add it to the map. Not too much work for that one attribute, but eventually we'd find out that we also need to include :person/ssn as well. And then the :address/zipcode of the :person/address referenced by this person entity, adding several where clauses, and ever increasing lists of logic variables and input bindings.

And then, when we want to search for all the person entities that have the last name '"Murray"', we have quite a bit of code to either duplicate or extract from the function definition.

Enter pull

The pull API can help here because we can search for an entity using a lookup ref, which is a data structure, and declare a hierarchical selection of attributes we want in a data structure as well:

(d/pull (d/db conn)
        [:person/first-name
         :person/last-name
         {:person/address [:address/zipcode]}]
        [:person/ssn "123-45-6789"])

The result is a clojure map that looks a lot like the pattern we submitted to pull:

{:person/first-name "William"
 :person/last-name "Murray"
 :person/address {:address/zipcode "02134"}}

See how nicely this separates how we search for the person from the details of the person we want to present? Also, who knew that Bill Murray lived where all the Zoom kids live? (Hint: he probably doesn't.)

But what if we want to find all of the persons that live in "02134"? pull requires an entity id or a lookup reference, so we'd have to search for those separately, and then invoke pull-many, resulting in two separate queries.

Pull expressions in queries

Luckily, Datomic supports pull expressions in queries, so we can find all of the persons that live in the "02134" zip code like this:

(d/q '[:find [(pull ?e [:person/first-name
                       :person/last-name
                       {:person/address [:address/zipcode]}]) ...]
       :in $ ?zip
       :where
       [?a :address/zipcode ?zip]
       [?e :person/address ?a]]
     (d/db conn)
     "02134")

The :where clauses in this example are all about search, and the presentation details are represented in the pull expression. This provides the same clean separation of concerns we get from the pull function, and does it in a single query. Nice!

Now, when the requirement comes in to add the :person/middle-name to results of this query, we can just add it to the pull expression:

(d/q '[:find [(pull ?e [:person/first-name
                        :person/middle-name
                        :person/last-name
                        {:person/address [:address/zipcode]}]) ...]
       :in $ ?zip
       :where
       [?a :address/zipcode ?zip]
       [?e :person/address ?a]]
     (d/db conn)
     "02134")

And, because the pull expression is just data, we can pass it in:

(defn find-by-zip [db zip pull-exp]
  (d/q '[:find [(pull ?e pull-exp) ...]
         :in $ ?zip pull-exp
         :where
         [?a :address/zipcode ?zip]
         [?e :person/address ?a]]
       db
       zip
       pull-exp))

(find-by-zip (d/db conn)
             "02134"
             [:person/first-name
              :person/middle-name
              :person/last-name
              {:person/address [:address/zipcode]}])

And compose it:

(def address-pattern [:address/street
                      :address/city
                      :address/state
                      :address/zipcode])

(find-by-zip (d/db conn)
             "02134"
             [:person/first-name
              :person/middle-name
              :person/last-name
              {:person/address address-pattern}])

Or support a default:

(defn find-by-zip
  ([db zip] (find-by-zip db zip '[*]))
  ([db zip pull-exp]
   (d/q '[:find [(pull ?e pull-exp) ...]
          :in $ ?zip pull-exp
          :where
          [?a :address/zipcode ?zip]
          [?e :person/address ?a]]
        db
        zip
        pull-exp)))

(find-by-zip (d/db conn) "02134")

Now clients can tailor the presentation details based on their specific needs in a declarative way without having any knowledge of the query language itself, but they're not forced to.

Summary

Separation of concerns makes code easier to reason about and refactor. The pull API separates search from attribute selection, but limits search to a known entity identifier. Despite that constraint, it's still a very good fit when you already know the entity id or the value of a unique attribute to use in a lookup ref.

Query supports this same separation of concerns, and it's up to you to write your queries this way, but doing so gets you the same benefits: simpler code that is easier to reason about and refactor. Plus you get the full power of Datalog query!