Improving on Types: Specing a Java Library

Good application developers use existing libraries rather than reinventing every wheel. This is particularly true on the JVM, where there are a huge number of wheels to choose from. So in addition to writing good code, developers need be able to assess existing code, both for fitness to purpose and for quality.

Java developers lean heavily on types to understand code. Programmers use types for

  • development time checks, via compilation
  • documentation
  • development assistance, e.g. autocomplete

Clojure improves on the plain Java experience, providing a superior environment for the "hands on" portion of assessing a library:

  • the REPL provides a tight feedback loop
  • it is often easy to simplify OO "ball of mud" mutable interfaces with functional, immutable wrappers
  • printable/readable data makes it easier to reproduce and share results

Clojure spec makes this story even better. Specs are more expressive than Java types, and programmers can leverage specs for

  • development time checks, via instrumentation
  • documentation
  • development assistance, e.g. autocomplete
  • automatic example generation
  • automatic test generation

This article will demonstrate the use of spec in assessing a Java library, XChart.

Assessing Charts

I regularly need to produce charts from data. My requirements for this include the following:

  • chart inputs should be generic data, not code
  • chart outputs should be data and should support a variety of de facto standard formats
  • no kitchen sink – I want only charting, in a small footprint library

After a little bit of internet research, XChart looks promising. The home page immediately covers my requirements for output formats and for small footprint. After reading a little bit of code, I can also see that the input is data-oriented, by Java standards:

double[] xData = new double[] { 0.0, 1.0, 2.0 };
double[] yData = new double[] { 2.0, 1.0, 0.0 };
XYChart chart = QuickChart.getChart("Sample Chart", "X", "Y", 
     "y(x)", xData, yData);
new SwingWrapper(chart).displayChart();

Those double arrays are making me very happy right now–I was afraid I would find something like XAxisCoordinateFactoryBuilder.

Unfortunately, series and style manipulation is less data-oriented, and is done via a mix of builders, getters, and setters:

XYChart chart = new XYChartBuilder().xAxisTitle("X").yAxisTitle("Y")
                .width(600).height(400).build();
    chart.getStyler().setYAxisMin(-10);
    chart.getStyler().setYAxisMax(10);
    XYSeries series = chart.addSeries("" + i, null, getRandomWalk(200));

Builder/Getter/Setter APIs make it far more difficult to share information with others: instead of giving you the data that describe a graph, I have to give you a program that tells a particular piece of software how to make a graph. Happily, this particular API doesn't look very fancy, so it will probably be easy to wrap it in a data-driven Clojure API. Happier still, a quick search reveals that somebody else has already done it: hyPiRion has written clj-xchart.

Clojure Advantage: Data

clj-xchart provides one enormous advantage over raw XChart: in clj-xchart, creating a chart is a pure function of generic data. As a result, the data to make a chart can be

  • manipulated via ordinary collection APIs
  • made durable for sharing

The docs include this nice example:

(c/pie-chart
      [["Not Pacman" 1/4]
       ["Pacman" 3/4]]
      {:start-angle 225.0
       :plot {:background-color :black}
       :series [{:color :black} {:color :yellow}]})

At this point I am making charts in various formats from functions from generic data, using lightweight libraries. Looking back, this meets all of my original requirements. Now I am going to add Clojure spec to the mix, to help me write correct programs.

Catching Problems at Development Time

With Clojure spec, I can specify the XChart API as deeply as I want to go. In particular, I can capture

  • structure (somewhat like Java types but richer)
  • arbitrary predicates about the runtime values of data

You can see what this looks like in my spec fork of clj-xchart. To see some of the ways that spec improves on Java types, consider this broken program:

(xchart/xy-chart {"bad-chart"
                      {:x [3 2 1] :y [4 5 7]
                       :style {:render-style :area}}})


=> #object[org.knowm.xchart.XYChart blah blah blah]

The chart above is broken, because the XChart API requires that :area charts have their :x values in ascending order. Because XChart's Java types do not catch this problem, the broken call to xy-chart goes undetected, returning a broken value that will cause a failure later, when somebody attempts to render the chart.

With spec, I can instrument my program at development time to catch the problem early. test/instrument will enforce a function's argument specifications at development time:

(test/instrument [`c/xy-chart])

Now I can detect the problematic args as soon as they happen:

(xchart/xy-chart {"bad-chart"
                      {:x [3 2 1] :y [4 5 7]
                       :style {:render-style :area}}})

=> ExceptionInfo Call to #'com.hypirion.clj-xchart/xy-chart did not conform to spec:
    In: [0 "bad-doublings" 1] 
    val: {:x [:numbers [3 2 1]], :y [4 5 7], :style {:render-style :area}} 
    fails spec: :com.hypirion.clj-xchart.specs.series.xy/series-elem 
    at: [:args :series 1] predicate: data-compatible-with-render-style?
    :clojure.spec.alpha/args  ({"bad-doublings" {:x [3 2 1], :y [4 5 7], 
                                                 :style {:render-style :area}}})
    :clojure.spec.alpha/failure  :instrument
    :clojure.spec.test.alpha/caller  {:file ..., :line 35, :var-scope user/eval934}

The ExceptionInfo provides a ton of information pinpointing the error:

  • The In and at sections of the error are a navigable data representation of where the precise error occurred. An IDE might use these to draw red squiggly underlines in the problem data.
  • The val gives us the failing argument list. This is useful when the failure is in a nested call
  • The spec names the spec that failed
  • The predicate names the specific predicate within the spec that failed, so we know that our data is not compatible with the :area render style.
  • The clojure.spec.tests.alpha/caller tells us precisely where the bad call occurred. This is invaluable when the problem is in a nested call and not immediately visible at the REPL.

Problems as Data

test/instrument is one of many ways to leverage a spec. With s/valid?, I can ask if the arguments would work, without actually even making a chart:

(s/valid? ::cs/xy-chart-args
              {"bad-doublings"
               {:x [3 2 1] :y [6 5 4]}})
=> false

Or, I can s/explain out how the arguments failed to be valid:

(s/explain ::cs/xy-chart-args
               [{"bad-doublings"
                 {:x [3 2 1] :y [6 5 4]
                  :style {:render-style :area}}}])

    In: [0 "bad-doublings" 1]
    val: ...
    fails spec: :com.hypirion.clj-xchart.specs.series.xy/series-elem
    at: [:series 1]
    predicate: data-compatible-with-render-style?

Generating Test Data

With a good spec, I don't even have to write example programs. Specs can also be used to generate example data, via the exercise fn. The examples below show exercising a couple of clj-xchart argument types, ::series/line-width and ::series/series-name:

(s/exercise ::series/line-width)
=> ([2.0 2.0] [23 23] [16 16] [1.0 1.0] [0.5 0.5] [0.9375 0.9375] [74 74] [1 1] [1 1] [39 39])


(s/exercise ::series/series-name)
=> (["2E" "2E"] ["K" "K"] ["6" "6"] ["ur7" "ur7"] ["al" "al"] ["T" "T"] ["8fsxY" "8fsxY"] ["Jr" "Jr"] ["I2MDz2H3Q" "I2MDz2H3Q"] ["S" "S"])

exercise returns N (default 10) tuples. The first value in each tuple is a valid example value, and the second is a conformed data structure, explaining how the value matches its spec. For simple specs such as these, the raw and conformed values are identical.

While the line-width examples look reasonable, the random gibberish strings are not likely values for a series-name in an actual program. We can improve on this with examples.

Example-Assisted Generation.

You can override generators passed to exercise. One very common trick is to generate from a small fixed set:

(def example-series-names
      #{"Grommets" "Hit Points" "Expected" "Actual"
        "Emacs Users" "Vim Users" "Pirates" "Global Warming"})

(def generators {::series/series-name #(s/gen example-series-names)})

(s/exercise ::series/series-name 5 generators)
=> (["Grommets" "Grommets"] ["Emacs Users" "Emacs Users"] ["Grommets" "Grommets"] ["Vim Users" "Vim Users"] ["Expected" "Expected"])

In the code above, example-series-names is a plain Clojure set. The call to s/gen makes a generator from that set, and the generators map stores overrides from spec names to custom generators.

With a few hours effort, I was able to create specs for styles, series, and argument lists, plus a small set of generator overrides. Putting that all together, I can make random XY charts:

(s/exercise ::specs/xy-chart-args 1 ex/generators)

=> ([({"Pirates" {:x [-3.0 -2.0 -0.75 -0.0 -1.0 0.5 -2.0], :y [-0.75 -1.0 3.0 1.0 -0.5 3.0 -1.0], :error-bars [0.75 1.0 0.0 -3.0 -1.0 2.0 -1.0], :bubble nil}, "Expected" {:x [2.0 -0.5 -0.5 0.5 -2.0 1.0 -0.5 -1.0 2.0 1.0 -1.0 2.0 0.5], :y [2.0 -0.5 -1.0 -0.5 -2.0 2.0 1.0 1.0 2.0 0.5 -0.5 0.5 -0.5], :error-bars [0.5 2.0 0.0 1.0 -1.0 1.5 1.0 -0.5 0.5 -1.0 1.0 2.0 0.5], :bubble nil}, "Emacs Users" {:x [0.5 -0.5 -1.25 2.0 3.0 0.75], :y [1.0 -2.0 -1.5 2.0 -3.5 -0.875], :error-bars [2.0 -2.0 0.5 0.5 2.0 -1.0], :bubble nil}, "Grommets" {:x [-1.0 0.5 3.0 0.75 1.0 -3.0 0.5 1.0 1.5 -1.5 0.5 1.0 -0.5 -2.0 1.0], :y [0.5 2.0 1.0 2.0 -0.5 -2.0 0.5 0.5 -1.0 1.0 2.0 1.0 -0.75 -0.5 -1.5], :style {:show-in-legend? false, :line-width 2.0, :line-color :red, :marker-color :red, :fill-color :dark-gray}, :error-bars [2.0 -0.5 1.0 -0.0 -2.0 -2.0 3.0 -1.0 2.0 1.0 -1.0 -2.0 -2.0 1.0 -0.5], :bubble nil}, "Global Warming" {:x [-0.0 2.0 -2.0 -0.5], :y [-2.0 -1.5 -3.0 -0.5], :error-bars [-0.5 -0.5 0.5 3.0], :bubble nil}, "Actual" {:x [-3.625 -5.5 2.828125 -2.4375], :y [-4.2109375 -2.0 -2.25 2.5], :error-bars [4.0 -0.5 -3.65625 -5.0], :bubble nil}, "Hit Points" {:x [-2.375 0.5 -0.5 0.75 2.0 -1.5 1.0 0.59375 0.96875 1.0 -0.625 -0.84375 -1.0], :y [1.5 3.25 -1.5 -0.625 -2.0 -2.0 -0.625 -1.875 2.0 1.5 -1.0 0.625 -2.5], :style {:line-style :dash-dot, :line-color :gray, :line-width 1.5625, :marker-type :square, :marker-color :yellow, :show-in-legend? true, :fill-color :dark-gray}, :error-bars [-3.0 -0.6875 -0.625 -1.125 -3.0 -0.90625 1.25 1.5 3.0 0.875 -2.0 -3.0 -1.0], :bubble nil}, "Vim Users" {:x [-2.0 3.0 0.75 -2.0], :y [2.0 0.0 1.0 -1.0], :error-bars [1.0 -0.875 2.5 -0.75], :bubble nil}}) {:series {"Pirates" {:x [:numbers [-3.0 -2.0 -0.75 -0.0 -1.0 0.5 -2.0]], :y [-0.75 -1.0 3.0 1.0 -0.5 3.0 -1.0], :error-bars [0.75 1.0 0.0 -3.0 -1.0 2.0 -1.0], :bubble nil}, "Expected" {:x [:numbers [2.0 -0.5 -0.5 0.5 -2.0 1.0 -0.5 -1.0 2.0 1.0 -1.0 2.0 0.5]], :y [2.0 -0.5 -1.0 -0.5 -2.0 2.0 1.0 1.0 2.0 0.5 -0.5 0.5 -0.5], :error-bars [0.5 2.0 0.0 1.0 -1.0 1.5 1.0 -0.5 0.5 -1.0 1.0 2.0 0.5], :bubble nil}, "Emacs Users" {:x [:numbers [0.5 -0.5 -1.25 2.0 3.0 0.75]], :y [1.0 -2.0 -1.5 2.0 -3.5 -0.875], :error-bars [2.0 -2.0 0.5 0.5 2.0 -1.0], :bubble nil}, "Grommets" {:x [:numbers [-1.0 0.5 3.0 0.75 1.0 -3.0 0.5 1.0 1.5 -1.5 0.5 1.0 -0.5 -2.0 1.0]], :y [0.5 2.0 1.0 2.0 -0.5 -2.0 0.5 0.5 -1.0 1.0 2.0 1.0 -0.75 -0.5 -1.5], :style {:show-in-legend? false, :line-width 2.0, :line-color :red, :marker-color :red, :fill-color :dark-gray}, :error-bars [2.0 -0.5 1.0 -0.0 -2.0 -2.0 3.0 -1.0 2.0 1.0 -1.0 -2.0 -2.0 1.0 -0.5], :bubble nil}, "Global Warming" {:x [:numbers [-0.0 2.0 -2.0 -0.5]], :y [-2.0 -1.5 -3.0 -0.5], :error-bars [-0.5 -0.5 0.5 3.0], :bubble nil}, "Actual" {:x [:numbers [-3.625 -5.5 2.828125 -2.4375]], :y [-4.2109375 -2.0 -2.25 2.5], :error-bars [4.0 -0.5 -3.65625 -5.0], :bubble nil}, "Hit Points" {:x [:numbers [-2.375 0.5 -0.5 0.75 2.0 -1.5 1.0 0.59375 0.96875 1.0 -0.625 -0.84375 -1.0]], :y [1.5 3.25 -1.5 -0.625 -2.0 -2.0 -0.625 -1.875 2.0 1.5 -1.0 0.625 -2.5], :style {:line-style :dash-dot, :line-color :gray, :line-width 1.5625, :marker-type :square, :marker-color :yellow, :show-in-legend? true, :fill-color :dark-gray}, :error-bars [-3.0 -0.6875 -0.625 -1.125 -3.0 -0.90625 1.25 1.5 3.0 0.875 -2.0 -3.0 -1.0], :bubble nil}, "Vim Users" {:x [:numbers [-2.0 3.0 0.75 -2.0]], :y [2.0 0.0 1.0 -1.0], :error-bars [1.0 -0.875 2.5 -0.75], :bubble nil}}}])

I am not going to bother to pretty-print or explain that output, since we are exploring a program (XChart) whose purpose is to render such data:

(->> *1 ffirst (apply xchart/xy-chart) xchart/view)

Generative Testing

Once you can automatically generate program inputs, it is straightforward to wire those inputs into a test suite. And instead of writing more tests, you can increase test coverage by simply asking the generator for more inputs. Once I had the basic generators working, I started generating hundreds of random charts, eyeballing them here and there to make sure that I understood both the XChart API and my generators.

I expected that this exercise might uncover bugs in my generators, or in the clj-xchart wrapper, or in XChart. I was completely surprised by what happened next: Within ten minutes of generating random data for clj-xchart, I generated a test case that revealed a JVM bug (repro code, data) that crashed the JVM on my laptop. This underscores both the power of generative testing, and the futility of relying solely on handwritten tests.

Conclusions

Specs repay your effort many times. With a set of specs, you can

  • instrument programs
  • perform a la carte validation and conformance
  • generate test data
  • generate and run tests
  • enhance documentation

I believe that spec is a significant step toward Clojure becoming the language for expressivity on the JVM, much as Python is the language for expressivity with C-linkage libraries. Even without spec, Clojure's data orientation is a great first step in this direction. Now we need to create and curate many more tasteful libraries like clj-xchart. With spec, we can address the fears (both real and imagined) of accumulating large systems in a dynamically-typed language. Let's spec a world where Clojure programs are easy to develop, fast to execute, and a joy to consume and maintain.