Works On My Machine: Understanding Var Bindings and Roots

Works On My Machine is the place where Cognitects reflect on tech, culture, and the work we do. The views expressed on Works On My Machine are those of the author and don’t necessarily reflect the official position of Cognitect.

When designing applications and systems it can be important to understand the inner workings of certain aspects of the language being used by developers. One area of Clojure that is traditionally rather opaque and poorly understood is the inner workings of Vars, and how they interact with the Clojure Language. I recently encountered some behavior that seemed puzzling:

(def ^:dynamic answer 42)
 
(println answer)
; => prints "42"
 
(binding [answer 0]
  (println answer))
;=> prints "0"
 
(with-redefs [answer 0]
  (println answer))
;=> prints "0"
 
(binding [answer 0]
 (future
   (Thread/sleep 1000)
   (println answer)))
;=> prints "0"
 
 
(with-redefs [answer 0]
 (future
   (Thread/sleep 1000)
   (println answer)))
;=> prints "42"

 

This behavior likely makes little sense to anyone unfamiliar with the internal structure of vars and how they operate, but hopefully with a little explanation and a few examples we can start to see this "error" in a slightly different light. 

Vars are one of the very first things introduced in almost any Clojure programming tutorial. The way we assign a global name to a piece of data in Clojure is via def:

(def answer 42)

This code snippet creates a var named user/answer and gives it the root value of 42. But what really is happening behind the scenes and where does the value 42 get stored? Vars themselves are stored in namespaces, and we won't dig into that much in this article, but suffice it to say that there is a registry in the Clojure runtime where vars can be looked up by name. Vars themselves have a rather interesting structure that is best described via a diagram:

Vars contain their name, and a link to the namespace to which they are a member. The field threadBound that is set to true if a var is dynamic (more on that later). And finally, vars also contain a root value. In our example code, the root value of user/answer is 42. The rest of this diagram deals with the resolution of dynamically bound vars, and we'll discuss that later.

Now let's look the usage of a var:

(println answer)

When this code is compiled, the compiler will see the reference to answer and search for a var in the current namespace with that name. Since answer exists in our namespace, the compiler will insert a reference to that var into our function, and then also insert an implicit call to answer.deref() to get the value. The job of the deref method is to get the current value of the var. But what does that method look like?

final public Object deref(){

    TBox b = getThreadBinding();

    if(b != null)

        return b.val;

    return root;

}

public final TBox getThreadBinding(){

    if(threadBound.get())

        {

        IMapEntry e = dvals.get().bindings.entryAt(this);

        if(e != null)

            return (TBox) e.val();

        }

    return null;
}

The deref method itself is quite basic, it looks to see if there is a dynamic binding for this var, and if not, it returns the root. The call to .getThreadBinding() has a shortcut if the value of threadBound is false. Vars that return their roots are by far the most common var encountered in Clojure. In our snippet above the function println is itself a non-dynamic var that resolves to the println function instance. When the Clojure documentation speaks of "var roots" this is what they are describing, the value of the root field on a var. 

Now let's dive into the more complex side of vars. Let's update our snippet to make the var dynamic. 

(def ^:dynamic answer 42)

Surprisingly, not a whole lot changes to the var defined. The `threadBound` flag on the var is now set to true, that is all. However this will cause the JVM to execute the "true" block in the if inside getThreadBinding. In order to understand what this code is doing, let's go back and look at the diagram from the start of this article. Let's start by looking at clojure.lang.Var/dvals. This field is a static thread-local variable, meaning that every thread in our JVM instance will receive a new version of that variable. If one thread mutates that variable, the other threads will not see the changes. This thread-local field contains a reference to a linked list of Frames. Each Frame contains a hash map of Vars to instances of TBox. The rationale behind TBox-es will have to wait until a later time, but for now we can recognize that TBox-es have references to the actual Object value to which the var should resolve. 

In our most recent snippet, we've set the var to dynamic, but we still haven't modified the Frame stack. Instead we've simply marked a var as dynamic and set its root to 42. Now when we call deref on that var, the call to `threadBound.get()` will return true, but when the Var digs into the Frame referenced bydvals it won't find an entry for answer and the root will be returned instead. In order for an entry to be found in the Frame instance we'd need to invoke a call to binding.

(binding [answer nil] ;; Creates and pushes new Frame
  (binding [answer 44] ;; Creates and pushes a 2nd new Frame
    (println answer)
    ;; Frame is popped
  )
  ;; Frame is popped
)

Here we're creating two new frames, one with answer bound to nil and another with answer bound to 44. Each time we enter a binding macro body, a new frame is pushed, and when the binding exits that frame is popped. Since these Frames create a stack, the old values are never overwritten and the parent block's values will still exist. This creates what is known as "dynamic extent", where the binding of a var is considered valid for the lifetime of an expression (the body of the binding), and all functions called by that expression.

Armed now with what we've learned about vars, let's go back to our first example. Why do these two forms print different data?

(binding [answer 0]
 (future
   (Thread/sleep 1000)
   (println answer)))
;=> prints "0"
 
 
(with-redefs [answer 0]
 (future
   (Thread/sleep 1000)
   (println answer)))
;=> prints "42"

At first glance these two constructs seem similar. They both change the value of a var, and then after one second print the value of a var in another thread. The first, however, is using the dynamic binding features of Vars. When a new thread is created (in this case by the call to future) the initial state of it's dynamic bindings are initialized from the creating thread's top level Frame. This is known as "binding conveyance" and is supported by futures, as well as core.async's thread and go constructs. 

In the second example a call is made to with-redefs. This macro looks and acts much like binding, but with one key difference, it mutates the var root. Since Var roots are global for the entire application, once the body of the future gets around to deref-ing the var (after the 1 sec pause) the root value has already been reset back to 0, causing a race condition. 

 

So to summarize: Var roots are global. Var dynamic bindings are thread-local, and dynamically scoped. Since altering the root of a var involves the mutation of global data it's recommended that code constructs that require changing the root of a var at runtime be avoided. Many times a simpler solution can be found, either by leveraging dynamic vars, or by parameterizing functions with additional arguments.