The Cognicast Transcripts
In this episode, we talk to Nathan Marz about Storm, Specter and flying.
The complete transcript of this episode is below.
The audio of this episode of The Cognicast is available here.
OUR GUEST, NATHAN MARZ
- Stream Processing
- Fault tolerance
- Stream vs. Batch processing
- StrangeLoop talk
SUBSCRIBING TO THE COGNICAST
You can send feedback about the show to email@example.com, or leave a comment here on the blog. Thanks for listening!
EPISODE COVER ART
CRAIG: Hello, and welcome to Episode 95 of The Cognicast, a podcast by Cognitect, Inc. about software and the people who create it. I'm your host, Craig Andera.
We're going to start with a few announcements to let you know about things going on in the Clojure community - the ones we're aware of anyway.
I'll start with mentioning ClojureBridge. There's one going on in London. That's February 19th and 20th. This is, of course, in 2016. You can find out more about that event at ClojureBridge.org. I'll remind you, as always, there is also a place, a way for you to donate on the ClojureBridge website. You've heard us talk about ClojureBridge before. I'd definitely encourage you to go by. Look for the London event February 19th and 20th, and donate.
A few conferences to mention: There is Clojure Remote coming up pretty soon. That's happening February 11th and 12th. This is a completely remote conference. It'll take place over the Internet, so you can enjoy it from the comfort of your living room, office, or whatever you've got going on there, a comfortable place. Check it out at clojureremote.com. That's put on by our good friend Ryan Neufeld, so it's in good hands. It looks to be a very interesting idea, and I think you'll enjoy it, so check it out.
I also want to mention ClojureD, which is a conference that's taking place in Berlin February 20th, 2016. You can find out more about that at www.clojured.de. I just want to mention that conference is in English. Even though it's being held in Berlin, the conference will be held in English, so another good one to check out on the continent.
Of course, Clojure/west is coming up. You can find out more about that at clojurewest.org. Significant to note that the call for proposals will be ending January 29th, so we'd love to hear your proposal for a talk for Clojure/west. I think it's a great conference for people to speak at. If they haven't spoken at a Clojure conference before, Clojure/west is a good venue for that. Again, you can find more info about that at clojurewest.org.
I also want to be sure to mention that Clojure/west will be offering opportunity grants. The link for that should be live shortly, probably by the time you hear this. Opportunity Grants is a program that Clojure/west uses to help people attend who would not otherwise be able to attend, typically from underrepresented groups. There'd be more information about that on the Opportunity Grants page, again available shortly.
Certainly looking for participants for that program. Would love to have you there and, hopefully, for some people, this will permit you to go. But also, we're looking for sponsors. So, please, everybody, check that out. There's lots of ways for you to participate in the Opportunity Grant program.
Finally, I want to mention--you may already be aware, but it's very exciting news--that there is a brand new clojure.org website. Brand new in the sense it has a new look and a few new sections. Of course, all the content that was there before is still there: all the reference documentation, Rich's extremely well written explanations of how things work, and all that good stuff.
A few notable sections: There is now a News section, which there's going to be posts there about, like, releases and other things to note. There's an Events page where you'll be able to find out information about upcoming Clojure events. And, there's a Guides section, which is intended to help build out community guides and tutorials and things like that. That's all very exciting.
Especially exciting, I think, is the fact that all of clojure.org is stored in a repo, which is open for pull requests once you sign a contributor agreement, so this is something that the community is going to be able to pitch in and help make better. Definitely go by clojure.org. Check out the new look and see how you can get involved and make it better.
I think we will leave it there. That's quite a lot of good news for one day. We'll go ahead and go on to Episode 95 of the Cognicast.
[Music: "Thumbs Up (for Rock N' Roll)" by Kill the Noise and Feed Me]
CRAIG: All right. I think that's it, then. I'm ready to go if you are.
NATHAN: Yeah, let's do it.
CRAIG: Awesome. Yeah, let's do it. All right. Welcome, everybody. Today is Monday, December 14th, and this is the Cognicast.
Today, we have as our guest, a person I spent some time talking with at Strange Loop, and I was like, "Oh, you've been on our list for a while. Would love to have you on. Do you think it would work out?" He said yes, and here he is. I'm referring to our guest today, Nathan Marz. Welcome to the show.
NATHAN: Thank you for having me.
CRAIG: It's our pleasure. I really did genuinely enjoy talking to you at Strange Loop, but it was the first time we really had a chance to sit down and chat, and we're looking forward to doing more of that today. But, before we get into it, we have our opening question that we always run with, which is to ask our quest to relate some experience of art, whatever that means to them: a movie that you like, a book that you read, or--I don't know--a cloud you saw the other day that looked like your favorite food. I don't know. Something like that, but maybe you have something in mind you'd like to share with us.
NATHAN: Sure. Yeah. I guess, with respect to art, a big thing in my life has always been playing the piano. I've been playing since I was about seven years old. But, I don't think it was really an important thing in my life until high school when I had this really amazing teacher who was this woman from Russia. The level of passion she had for piano and teaching piano has really stuck with me my entire life.
Yeah, it's just something that I really enjoy. When you're doing programming, your brain can just get so into a problem, and sometimes it's good to step away. Sitting down for a bit and playing the piano is a really good way to get your brain focused on something else and to relax.
CRAIG: Now, do you perform at all, publicly, or it's just something you do for your own enjoyment? How do you do that?
NATHAN: Yeah. It's just something for my own enjoyment. I play for friends
sometimes, and I do a little bit of composition. But, yeah, I'm not that good at composing. It's just a fun thing to do every now and then.
CRAIG: Very cool. That's interesting. We may have to dive into that a little bit later. We could talk about the links in creativity between performance and stuff like that.
There are a couple of other things that I think maybe it would make more sense for us to talk about first. I think people probably know you as a fairly prominent Clojurist. You've done some interesting, really interesting work. I'm trying to think whether we would start with stuff you'd done before or stuff that you've done more recently. I'm referring, of course, to the two things that I know you best for - Storm and, more recently, Specter.
CRAIG: Do you have a preference? Should we go chronologically or reverse chronologically?
NATHAN: We can just start with Storm.
CRAIG: Okay, yeah.
CRAIG: Storm is a really interesting technology that has seen wide adoption, and I think it's been around for a while, but maybe not everybody knows about it, so I wonder if you wanted to give us sort of the précis, maybe a little history, or whatever you think is important or interesting for our listeners to hear.
NATHAN: Sure. Storm, it's a project that I started when I was working on the startup called Backtype. It is a distributed, scaleable, and fault tolerant stream processing system. It's all about just processing massive streams of data at very, very high throughputs with very, very low latency.
Storm kind of initiated this modern wave or this new wave of stream processing. Before Storm, in the big data world, it was all about batch processing. Then you had all these scaleable, big data databases. There was not really a good -- there was really nothing there, at least in the open source world, for doing scaleable and fault tolerant stream processing. Storm kind of kicked that movement off.
CRAIG: I wonder if you could explain to me stream processing. I think that's a key term in the problem space that Storm is trying to address.
NATHAN: Yeah, so stream processing, I guess, to summarize it most generally, it's just about processing very large amounts of data with low latency. The most basic thing you would do with stream processing is you would look at these events that are coming in.
Let me give a concrete example. At Backtype, we were doing real time social media analytics. We might be consuming a stream of tweets. We'll consume it, and we will update our databases with IBM stats like how many retweets has this URL gotten over time and a variety of analytics like that.
The key to stream processing is, first of all, you need to have something that scales. So, as your stream gets bigger and the state you're storing gets bigger, you can just add nodes to scale it.
The second big thing is fault tolerance. You're doing distributed processing, and things fail in distributed processing. You might have a node go down or any number of errors can happen. You want to make sure that, regardless of the error, you're always able to recover and have very strong guarantees on your data being processed.
Now, in stream processing, there's a variety of ways to do stream processing. One of them is this idea of one at a time stream processing where you just process events as they come in, kind of independently, so not really connected to the other events you're getting. With that form of processing, the best guarantee you can have is at least once processing. You're guaranteed to process each event at least once. But, in the case of failures, they may be processed more than once.
There's another form of stream processing called micro-batch stream processing where you basically divide your incoming stream into a series of batches. You process each batch, and you don't move onto the next batch until the current batch has been successfully processed completely. It turns out that, with micro-batch processing, you can actually achieve exactly once processing semantics. These are basically the main concepts involved when doing stream processing.
CRAIG: Mm-hmm. It's interesting you say that because I've heard people contrast stream processing with batch processing. But, given this idea of micro-batches, I wonder maybe I'm not confused to whether there's a dichotomy there at all. Is it the case that those two things are somehow in opposition, or could you maybe define them for us and help us understand.
NATHAN: Is batch processing and micro-batch processing?
CRAIG: Well, actually, I was referring to batch processing and stream processing.
NATHAN: Oh, yeah.
CRAIG: Are those two completely different things? Are they somehow related? What's the Venn diagram look like, I guess?
NATHAN: Well, I would definitely say that I think probably the best way to categorize is what I call -- let's just say one at a time stream processing. That's one category. Another category is micro-batch stream processing. We'll say the third category is big-batch processing. The key, huge, fundamental difference between any sort of stream processing and big-batch processing is your latency constraints.
With big-batch processing, you're processing so much data, there's no expectation of having the hard latency constraint where, oh, this needs to finish fast, like within 100 milliseconds or within a few seconds. Typically, big-batch processing, it takes a few hours, and that's totally reasonable. That actually changes the way in which the system can be architected.
Big-batch processing, actually a batch processor can be much simpler because it can just rely on doing -- if there's a failure, let's just redo all that work and just recompute from scratch. Whereas, with stream processing, you have tight latency constraints, so you can't just recompute everything from scratch from everything you've ever seen. I know it's hard to get into the nitty-gritty details just in a podcast, but that greatly affects the design and operation of these systems.
CRAIG: It totally, totally makes sense to me, actually.
CRAIG: I completely understand how that would drive things, as you say. You just have more options open when you can say, "Oh, that work I just did for three hours, I can just do that again," right?
NATHAN: That's right.
CRAIG: Cool. Well, obviously one of the interesting things, to at least our audience, about Storm, aside from its capabilities, is that it's written in Clojure.
CRAIG: One of the really interesting things to me about Storm, and you can correct me if I'm wrong because I haven't actually used Storm on anything in production, and so I haven't spent a lot of time reading the docs. But, from when I've looked at it, I thought one of the really interesting choices you made and, I think, a really good choice you made was actually to kind of downplay the fact that it's written in Clojure a bit.
The way I see it positioned is it's really more of a JVM positioned thing than it is a Clojure positioned thing, which of course opens it up to a wider audience. Even though you obviously leveraged Clojure when you built it, you did it in such a way that Java programmers can still take advantage of it. Again, correct me if I'm wrong, but it feels like you made an intentional choice to position it that way in the docs and in the way that you talk about it. Am I way off base, or is that what you decided to do?
NATHAN: You pretty much have it. I wouldn't say I ever hid the fact that it's written in Clojure. I was always very upfront about that. I would always mention that in my presentations. But, it was a very, very intentional choice in the design of Storm.
The way I designed it is all the interfaces of Storm are Java. They're literally written in Java. But, most of Storm's implementation was in Clojure just because I love Clojure and I'm just much more productive in it.
I think the qualities of Clojure allow you to make more robust programs, but the interfaces were always in Java. Using Storm, to actually use it and build applications on it, you're always implementing straight Java interfaces. A user of Storm, the fact that it's written in Clojure is completely irrelevant to you because that's all hidden behind the Storm interfaces.
Basically, very early on when I designed Storm and realized this could be such a generically useful thing, and it could be a huge project, I knew that in order to really reach my wide audience, I would have to do it that way where the interfaces are in Java. If I did Storm just completely in Clojure, well obviously that does, unfortunately, limit your possible users a great deal. That choice turned out to be very good, and it's one of the key reasons why Storm became a very, very big, international project.
CRAIG: Do you think there was anything about the nature of Storm that made that easier to do? I could imagine having certain interfaces that focus on -- well, for instance, what I mean is you can have interfaces to Clojure libraries that are very data centric, and there are a lot of really good reasons to make heavy use of the Clojure data types in those interfaces. We all know and love the Clojure collection classes, for instance.
CRAIG: You could imagine that being difficult to map across a boundary where Java was on the other side because you have choices about: how do you interact with those things, do I have to reinvent a bunch of interfaces, or have a whole bunch of interfaces that have object as the type of all the parameters?
CRAIG: Was there something about Storm that made that--do you think--any easier than might be the case for some other problem?
NATHAN: It's hard to say because, essentially, the question there is: How would the design of Storm be different if it was completely in Clojure?
NATHAN: Man, it's just like -- I built Storm four years ago.
NATHAN: The decisions I've made have become so ingrained. I'm just trying to think now, like, what would I have done differently.
CRAIG: It's maybe not a question quite as much about if you had done it all in Clojure. More, do you think there was something about the problem that you were solving that made that decision easier? I can't remember off the top of my head what the interfaces looked like.
CRAIG: If you have an interface that's just about invoking action with no arguments, then that's really easy because you have go, stop, pause, and resume, things like that that don't necessarily take parameters. Versus something like a rich query interface where you want to express that query as data or try to express a functional metaphor with higher order functions. I feel like that would be harder to do, and I don't know where Storm falls on that, and I don't know whether you think there was anything about Storm's problem space.
I guess what I'm really after is: I really think that was a good decision on your part. Clearly it paid off for you, right? Backtype got -- I'm sorry. I don't know the history of this more. Was Storm at all a part of the story around--?
NATHAN: The acquisition?
NATHAN: It's actually funny because, when we went into Twitter to do our technical due diligence, you know, so when a company is interested in acquiring another company, they do due diligence to make sure, like, whatever, you are what they expect you to be.
NATHAN: Right before that technical due diligence meeting, I blogged about Storm. That was the first blog post ever about Storm, and there was not really any real detail in there. It was just to build hype of this new product that's coming in the future. I guess I wrote the post well because it got a bunch of attention on Hacker News.
When we went in for the technical due diligence, all Twitter wanted to see was a demo of Storm. They didn't want to see any of Backtype's products, which is what they were acquiring us for. They didn't need to see that stuff. They were so interested in Storm because obviously Twitter, as a heavily real-time company dealing with lots of data, that was very relevant to them.
CRAIG: Sure. Sure.
CRAIG: Okay. That's kind of why I'm interested. I'm sitting there, and I'm thinking, well, okay, I have an idea. It's a good idea. Maybe I want to see a lot of adoption for whatever reason: ego. I'm sorry. I'm not trying to attribute that reason to you at all. I didn't mean it to come out that way.
But, just for whatever reason, right? I really want to see wide adoption, and I'm looking at the decisions you made around the way you architected the project, but also the way you positioned it. That seems like a really good idea if that's my goal.
CRAIG: And so, I guess I'm coming back to the question of, if I had some great idea, do you think it's the case that I would be able to make the same choices, or was there something inherent in the shape of the Storm problem that made it easier than the "average" for you?
NATHAN: Yeah, well, I mean as long as whatever you're solving is just kind of a problem of its own. It's not tied at all to the fact that you want to use Clojure for it. Storm, the problem it solves is just this generic, very generic data processing problem, which is something that a lot of companies have to deal with.
Certainly, like you know, my library is Specter. It doesn't make sense for me to make that with Java interfaces and Clojure implementation because Specter is entirely about improving Clojure programming. I think, yeah, as long as your problem is generic enough, then I think it's definitely a design choice to consider. Yeah.
CRAIG: Yeah. This is tough to talk about, right? Like anything kind of architect design related, it's very easy to rapidly wind up in the "it depends."
NATHAN: Yeah, yeah.
CRAIG: Right? For sure. Okay. That's good, though. I really like the perspective, and it's definitely one of those things that I've always looked at and said, "I think that was a really smart move on your part," so it's cool.
CRAIG: You mentioned Specter. We could definitely talk more about Storm. Maybe we'll loop back to it, but I actually have been using Specter recently, and I've found it quite beneficial.
I'm going to put words in your mouth here. I think you've described it as a missing piece from the Clojure API.
CRAIG: I think that's a supportable statement on your part, so I'm kind of excited to talk about it. I wonder if we could go over there for a little while. You said four years since you wrote Storm. I guess we're fast-forwarding, what, about three years to when you write Specter.
CRAIG: Maybe we'll just start, for those who haven't seen your talks about Specter, which I would highly encourage people to go and check out. You do a great job of explaining it and motivating it, but maybe you wouldn't mind restating a little bit for us what it is and how you came to decide to develop it.
NATHAN: Yeah. Specter is basically the library that I wish I had when I started using Clojure. Since I developed the ideas for Specter and implemented it, I use it every day. I use it very, very heavily.
What Specter is is it's a library for manipulating arbitrarily complex, immutable data. If you look at Clojure's built in functions for manipulating immutable data structures, you have something like, let's say, map, which takes in a function and a sequence, and then gives you back a new sequence. Now, the problem -- it's not really a problem, but just the nature of Clojure's base functions is they don't compose with more sophisticated data structures.
If you had something like a "lists" or lists as the values of maps, so the map is keys to lists, you can't just say, "Okay, I want to use this map on all the values in this map." You would need to take your map and iterate over it to get your keys and values. Then you can transform your values using the map function. Then you have to put the new keys and values back into a new map. And so, these nested transformations you want to do require you not only to figure out, okay, how do I even get to the sub-values for which I can do my nested transformation, but then how do I reconstruct back to the original data structure I was transforming?
This is something I had to do all the time, just over and over and over, my entire career writing Clojure, and I've been writing Clojure now for a very long time. I was writing it for more than a year before, probably more than a year and a half before I wrote Storm. So, it's just a very, very common thing you run into in Clojure. You have something that's more complicated than just a simple map or list, and you want to be able to manipulate them concisely and with high performance.
There's this saying in the Clojure community that better 100 functions each on 10 data structures than 10 functions each on 100 data structures. With Specter, I would go beyond that and say better than any of that is to have, let's say, 100 generic navigators each on 10 data structures than to have 100 functions each on 10 data structures because, with Navigator, which is the core idea behind Specter, they can compose as much as you want so that you can manipulate data structures of arbitrary complexity without any issue.
CRAIG: Now, a navigator is the equivalent of an ex-path navigator or, probably more familiar to a lot of people, a CSS selector, right? It's this idea of a subset of the data in a sort of path specification kind of way. Is that accurate?
NATHAN: Correct, and it could be anything from: Let me navigate to this key in this map. It could also be something like: Let me navigate to every element of a sequence. But, it could also be something like: Let me navigate to this subsequence of the sequence, where it's not really a nested data structure, but you're still able to navigate to a portion of some data structure.
CRAIG: Yeah. This is super useful. I think the two pieces -- maybe you have to comment on whether you think there are two pieces. I look at Specter, and I see a couple pieces.
I see that I could use it for navigation, for query, if you will, like, go find. There are 100 nodes in this notional tree arbitrarily deep and I want to find a certain 20 of them. I can do that. Then there's also kind of the update part of it where you can say, okay, given some subset of nodes, here's a function that makes them into new nodes and returns me the transformed value. Is that fair? Do you think it splits along those lines?
NATHAN: Yeah. Yeah, those are the two use cases for Specter. In both cases, it's all about navigation. In the querying case, you just navigate to the values that you just want back just on their own. Then, in the update case, you navigate to the values you want to update. But, what you get back is the original data structure, but with everything you navigated to changed according to your update function.
CRAIG: Mm-hmm. I forget where I was going with that question. Yeah, anyway, so we've definitely found it useful, for sure. I remember what it was.
One of the things that I think is interesting and, again, a choice that is maybe slightly unusual, but that I agree with, is that there's not really a DSL here, right? Specter is not a data specification where you say, well, if you have a vector, and it has a keyword as the first element and a thing as the second element. It's really more code oriented, and I've been kind of thinking a lot--over the last few years, actually based on something that Alan Dipert said to me on this show quite a while ago now--about how there's this--we've said it time and time again in the Clojure community--data, function, macro - in order of decreasing preference.
I think that's actually not quite right. We talked about this with Matthew Flatt as well. First of all, the Racket people would probably invert that because their macro system is central for them.
CRAIG: But, I think, again, I'm a bit of a newb with the Specter API, but it is really a function API. I think that that is a slightly unusual choice. I think a lot of us would reach for: How do I make an ex-path equivalent where it's a bunch of data and it's going to tell me how to get to--?
NATHAN: Yeah, like you have a string or something with slashes.
NATHAN: Maybe not everyone would immediately define that design, but maybe just do something like a vector where different symbols represent, like a star means "go to everything" and things like that.
CRAIG: Right. Right, exactly.
CRAIG: Exactly, and so you didn't do that.
CRAIG: Now, was that something that you really thought about at all, or was it just obvious to you that that wasn't the right way? What was the thinking that went into that aspect where you're like, "No. Functions is the right way to go here"?
NATHAN: Yeah. Well, actually, the original, well before I open sourced it, it was like 50 lines of code and it was basically something like that where you just have symbols that represent how you wanted to navigate through. It was fine. It worked for me for a little bit. Basically, the way I approach design software and the more experienced I've gotten and the better I've gotten at programming, the more this has been drilled into me. I always think in terms of what are my use cases, what are the specific problems I need to be able to solve with the tools and abstractions that I build.
At first, my needs for Specter were pretty straightforward. It was things like: navigate into an associative data structure, or navigate to every element of a sequence. And so, my original design of having basically what we described of just a vector of symbols that represents specific navigations worked totally find. It was the fastest thing for me to implement.
Then, as I kept working on things, and I realized I needed more and more and more out of Specter, like more and more ways to navigate, and I was also doing a bunch of stuff with graphs and graphs have all sorts of ways that you can navigate them and manipulate them, it didn't really make sense to just keep on making these special symbols to represent what kind of navigation you want. It became very clear to me that this needs to be a completely extensible API.
Then I started thinking about, okay, well, how do I abstract the concept of a navigator, so something that can step into some sort of substructure and can also compose with other navigators to make more sophisticated navigators. I had to give that some thought, but that was the key for me to turn Specter into what it is now as this fundamental, generic abstraction. It turned out to be a fairly simple interface, like what a navigator looks like. It actually uses continuation-passing style in the interface, and it was perfect. It's very clear, when you look at the interface and you understand it, that, okay, any navigation I'd ever want to do can be captured via that interface. From there it's just about, okay, I have specific use cases, and I can just implement them in terms of that interface.
CRAIG: Right. I think it's interesting to think about what the trade-offs are, right? You started out with this data. The straightforward, the obvious representation, the one I've probably written a couple times, the vectors of symbols, as you say.
CRAIG: Then the obvious trade-off is you quickly get to a point where -- well, because, what's the problem with data, right? The problem with data is that you have to have an interpreter, so there's got to be code somewhere that does something with it.
CRAIG: If you have to always modify that thing, then you kind of get to a point where it's like, "Well, if what I'm doing is expressing behavior, code is a pretty good way to do that," right?
NATHAN: That's right. When you want extensible behavior, that is exactly what protocols are for.
CRAIG: Right, but the drawback is that once you move away from data, you don't necessarily have as rich a way to manipulate the expression, the artifacts that you're working with, right? A Specter path is not something that's as easily, maybe -- actually, this is the question, really, is: Is that a limitation, the fact because a Specter path isn't data, except in as much as code is data, because it is just in a Lisp?
NATHAN: Right. Yeah. Well, that's the interesting thing about Specter. An individual navigator is an instance of this protocol, essentially. Then a path is just a sequence of these navigators. And so, with Specter's API, if you want to pass a path to the transform function or the select function to do whatever manipulation you want, you can just pass it to the vector. I do do stuff where I do need to dynamically make those paths. Then I'm just manipulating vectors.
Okay, it's like I need to do this, so let me concatenate that path with this other path over here to make my overall path, which I can then use dynamically at runtime. In that respect, it still is sort of like data if you want to use it that way.
CRAIG: Right. Right, right, right.
NATHAN: You did say a really important point before, though, where, when you do have that, back when Specter was this data oriented interface, it would be an interpreter, so it would have to read the sequence and interpret what the symbols meant in terms of what to do. It turns out that that interpretation process actually adds a ton of overhead to your transformation. One other nice thing about moving to this protocol oriented design is I was able to add this feature to Specter called pre-compilation where you could just completely strip any need to interpret the path when you're manipulating your data. It enables stuff to go much, much faster. In fact, Specter, when you use pre-compilation, the performance rivals hand-optimized code.
CRAIG: Yeah. Actually, the reason we chose it is, A, that we did have a reasonable need for performance, but also that we didn't want to write hand-optimized code. The hand-optimized code that we need to write, so I'm working on a project where we have a fairly deep data structure. I don't want to go into it too much, but basically we have a typical Clojure super nested data structure where it's a sequence of maps of maps, of sequences of maps, of sequences of maps. You know that type of thing.
CRAIG: We do transformation on it, actually. We actually go in and replace pieces of it way down in the guts with other things. In fact, what we're replacing values with is assertions about those values, so we're taking data, and we're saying this piece was valid, this piece was not valid, this piece was not valid for this reason. We're actually doing the transformations, and the hand-optimized expressions were nasty, I mean really nasty.
NATHAN: Oh, yeah.
CRAIG: Right. Really.
NATHAN: It's just a ton of nested function calls.
CRAIG: Yeah, and it's non-uniform, right?
CRAIG: You're mixing map, reduce, and filter, and mapcat, and all these things. Of course, the argument is those things are functions, and the functions themselves may call map, and so they didn't really. You couldn't thread macro your way out of inter-readability.
NATHAN: Yeah. Yep. Yeah, that is something I suffered with a lot before I wrote Specter.
CRAIG: It's been working out well for us. We really just got started with it in the last couple weeks, so we're still putting it through its paces, but so far so good.
NATHAN: Yeah. There's a lot. I haven't really had that much time to write documentation for it, but there is a lot of stuff in that API. Half the time when someone opens an issue on Github about wanting some functionality, it's already in there. I use it so heavily that I've just really explored that problem domain, so there's a lot of cool stuff there.
CRAIG: Yeah, yeah, absolutely, but I think what you're saying, and I believe this to be correct, is you have a rich API. But I think the important thing is that it's not rich in the sense of having a lot of concepts in it, right?
CRAIG: There's a small number of, like the navigator concept is pretty universal.
CRAIG: Once you understand that, then it's just a matter of finding the particular flavor that you want, the particular implementation. I think that's a big deal because it's a lot harder if you've got a library that has 100 concepts in it.
NATHAN: That's right.
CRAIG: Each of which has one or two functions versus one or two concepts, and then there's 100 functions that are basically just that concept. It becomes way easier to discover, to navigate, to internalize, and to leverage.
NATHAN: Yeah, that's exactly right. That's one of the reasons why I'm just so happy with that project because it's just really simple at its core.
CRAIG: Yeah, yeah.
NATHAN: But just incredibly useful.
CRAIG: Yeah, we certainly have found that. What's uptake been like? Have you been hearing from a lot of people using it?
NATHAN: Oh, man. People definitely seem to be using it. One of the frustrating things about open source, and this has always been the case, including on Storm and then the Cascalog before that is that you never really know when people are using your projects. People don't really tell you.
I remember, with Storm, I remember I got an email one day just out of the blue from someone at Alibaba just telling me, "Oh, hey. Just wanted to let you know that Storm is like a core part of our infrastructure and we've been using it heavily for a year and a half." I was like, "Wow! That's incredible." They're like one of the biggest companies in the world. That's a really big deal for this project.
Specter, obviously being a much smaller project than Storm, I don't know. I actually could not tell you how many people are using it.
CRAIG: Yeah, we have the same. One of the things we care about here at Cognitect is how many people are using Clojure. The answer is that we don't know. I think we're better positioned to hear about it than a lot of people, and the State of Clojure survey is going on right now. Of course, this will come out after that, so we'll have announced the results, and people will get to see all that. We're pretty public about, very public about all that. But, of course, that's not the whole story.
We've said things before like: N of the top M -- I shouldn't say top. N of the Fortune M companies are using it where N and M are numbers that are pretty close together, right?
CRAIG: A lot of people are using it, but you'd never necessarily know that by just poking your head up and looking around. Yeah, we've had the same experience where it's like, how do you know when somebody is using your technology, especially if they're using it successfully because they didn't have to ask you any questions?
NATHAN: That's right, yeah.
CRAIG: Or run into any problems.
CRAIG: It's a good sign.
NATHAN: Yeah, that's kind of a paradox. The better your software, the less likely you'll know because they won't come to you with issues.
CRAIG: Yeah, especially if you're not charging anything for it, right?
NATHAN: That's right, yeah.
NATHAN: Maybe I should charge for it.
CRAIG: Maybe you should.
CRAIG: Yeah. You mentioned Cascalog, too, and I had completely forgotten that was you as well, which is another cool project.
CRAIG: Yeah, so what do you have? Do you have anything in mind for Specter, or you're like, "Hey, man. This thing works, and it's done"? I love it when I get software that I'm like -- I have a library, a little one called Causatum, and every once in a while someone will ask me, "What are you going to do with it?" I'm like, "Nothing. It's done."
CRAIG: There are things that I could do, but there's nothing I have to do, so I don't know whether you're in that place with Specter or whether you have other thoughts about where you want to take it.
NATHAN: Specter is really well developed at this point. I actually, just yesterday, released a new version of it. I added this thing called Protocol Paths where you can have your path dynamically change based on the type of whatever you're currently navigated to. I just happened to; last week I was like, "Oh, wow. This would really help clean up this code I have right here. Let me implement that in Specter." Actually, it was a few weeks ago that I first ran into it.
There are very minor things that could be improved, and I have those listed on the GitHub issues. Certain for most people, Specter will likely totally satisfy your needs for a very long time until you maybe start doing very, very sophisticated stuff. I think the one area in which--I'm not going to especially could be improved, but more in which, because this would most likely be done as a separate library, but--would be just kind of expanding that collection of navigators that are just out there for you to just download and use as a library.
A good example of this: A lot of the stuff that I do in my own work is a lot of stuff with graphs. I have extended that Specter protocol just internally in my own stuff for doing all sorts of really cool graph manipulations. It's actually crazy just how sophisticated these graph transformations can be in so few lines of code, but it's a little bit tied in with my own stuff, so I haven't open sourced it yet. But, that is certainly a way to, I guess, improve the Specter ecosystem would be to have a Specter graph library or various libraries specific to particular data structures to just have a bunch of navigators that you can use and then compose with other navigators that are available.
CRAIG: I'm actually kind of curious. You've mentioned graphs. The query languages that I've used that I think of as most similar to Specter are things like CSS selectors, XPath navigation.
CRAIG: Those are inherently hierarchical, whereas you're talking about graphs, and so graphs can have things like cycles.
CRAIG: How does that map to Specter?
NATHAN: Well, I'll get into one of the specific navigators for it. Let's say you have a graph, some data structure that references a graph. Something I can do in Specter is I have a navigator to navigate you to a sub-graph of a graph. There are a variety of ways to choose the sub-graph. I could say, "All nodes reachable within two hops from this particular node," or I can just say, "Just give me the sub-graph, which is literally just this list of nodes."
Then you navigate to the sub-graph. The sub-graph that you have obviously contains the nodes that you chose, and then it only contains the edges, which are completely internal to the sub-graph, so edges that are between two nodes that are both in the sub-graph. But, of course, that sub-graph exists within this parent graph, and there are edges from the sub-graph to other nodes in the parent graph.
I can do things like, okay; let me navigate to this sub-graph. Then I'm going to have a function, which processes it, and then returns a brand new sub-graph that could have a completely different shape and structure from the original sub-graph. Then, Specter, when it actually transforms that sub-graph, in the parent graph all of those original nodes will disappear and they will be replaced with a sub-graph. Then there's this question of: What about all those edges that used to connect to the original sub-graph?
NATHAN: The way the library works is that you can annotate the nodes of the new sub-graph with metadata and say, "Okay, this node should absorb whatever edges to the parent graph for nodes A, B, C, and D in the original sub-graph," or, "This node should just absorb all the incoming edges that have not been absorbed by other nodes yet." By annotating with metadata, you can specify how should it be reconnected into the original parent graph.
NATHAN: Again, these navigators are all composable, so I could navigate to a sub-graph of a sub-graph of a sub-graph and, when the reconstruction happens, it'll just naturally work just because of the nature of the composability of navigators.
NATHAN: Yeah. Yeah, it's kind of like mind-boggling just how generic that is. But, when I have applied that just in my own work, it's just like, "Wow! That is the power of composable abstractions." Everything just works beautifully.
CRAIG: Yeah. Well, when you think about it, it's like all good ideas. It's kind of obvious because, when we talk about these things to each other, we say, "Well, what I want to do is I want to get all of the odd numbered nodes in this vector."
CRAIG: "Then those things are pointers into a graph. Then take the graph and pull it out and replace anything that has an associated value of three with two copies of itself." When you say it that way, out loud, we don't really get lost expressing this idea of following a path, even down multiple branches, and doing things like referring back the way that you talked about with metadata to previous values. It's kind of funny, but what you've basically done, from my newb standpoint, is just make it easier for us to write that down in code rather than in English because we already have pretty clear ways or at least I find it an easier thing to think about what I'm trying to do. It just becomes really, really messy to write it down, even in Clojure, if you stick to the core libraries.
NATHAN: Yeah, that's right. Yeah, with Specter, basically the code for these transformations looks a lot like how you think about them, which is always a good thing.
CRAIG: Yeah, yeah.
NATHAN: It's always a good thing as long as the code you write, there's no crazy trade-offs you're making like, "Oh, yeah, it looks like how I think of it, but it's 100 times slower.
NATHAN: Right. With Specter, you actually get the same performance.
CRAIG: Mm-hmm. There is one more thing that I have to talk to you today about, but I don't want to leave. We can come back to it, but I don't want to leave Specter as long as we're talking about it. Is there anything else that you think is worth mentioning about Specter before I ask you about something completely non-code related? You can probably guess what it is.
NATHAN: No, I think we covered Specter pretty good. I would just encourage anyone listening, if you want to learn more about it, I wrote a blog post about it. It was right after Strange Loop, so I guess that was a few months ago. The blog post, it uses a specific example to really ground all these concepts in. I think that's the best way to get started with it.
CRAIG: Mm-hmm. Yeah, I found it really easy to get going based on the documentation and based on your presentation - presentations, I should say. The Strange Loop one was one of them. Anyway, yeah … say the same thing. Okay.
The thing I wanted to talk to you about on the show, because we had such a great conversation about it at Union Station at Strange Loop is, so, in addition to you writing some truly impressive Clojure libraries, being a successful entrepreneur, and a pianist, you're also a private pilot.
NATHAN: That's right. I am a private pilot.
CRAIG: I am interested in aviation. I'm not myself a pilot of actual aircraft. I fly pretend ones in games, but we had a really fun conversation. I just thought it was such an interesting thing that so few people do, although I think a lot of people are interested, that I was wondering if you could talk about how you got into it, what motivated you, how long you've been doing it, and just if you have a favorite thing about flying, what it is because I'm interested in the topic.
NATHAN: Oh, yeah. I love - I absolutely love flying. The way I got into it is I do this thing. I've been in it for a few years now where, every year, I do something new. What I do, it's something that should challenge me and require significant effort to do.
Last year, I had a friend who had just started flight training, and he was telling me about it. Then I realized that, like, that is perfect for me. That is something I greatly enjoy because flying or learning to fly is just this great combination of science and technology.
To become a pilot, you have to learn a ton about physics, so things like aerodynamics, and you have to learn a lot about engineering. You learn a lot about the systems of the plane: how does the engine work, how do the fuel and the air mix, how does the carburetor work, how does fuel injection work, all this stuff. It's things that you really do need to know to safely fly a plane.
Then it's just obviously a thrilling thing to do. It feels very adventurous when you're doing it. For those reasons, it was perfect for me. That's why I got into it.
One of the things I love about flying, actually, is when I'm doing programming and doing my work, it's very intense. It can be hard to get your brain away from that. One of the ways I do that, I mentioned, is playing piano and things like that. But, one of the nice things about flying a plane is that, while you're flying, there is no possible way of you thinking about programming because you're thinking about staying safe in the air, seeing and avoiding traffic, and just flying the plane. That is something I really enjoy about it.
I forget what your other questions were about it.
CRAIG: No, that's cool. That's great, actually.
CRAIG: As you know from our conversation, this is something I'm very interested in, but I think it's a very interesting thing about you.
NATHAN: Well, you still have an open invitation to come up to New York, and I'll take you flying.
CRAIG: Oh, that'd be fantastic. I would love that. I will definitely do that one of these days. I appreciate that, especially you committing to it here in front of a few thousand of your closest friends, right?
NATHAN: Oh, yeah.
CRAIG: That's cool.
NATHAN: Happy to do it.
CRAIG: Yeah, that's great.
NATHAN: I love taking people for their first flights in small planes because it's very different than flying in a commercial airliner. The plane is much more responsive to the atmosphere, and you can just do a lot of cool things like stalls, steep turns, and things that you've never experienced before in a big plane.
CRAIG: Sure. You're flying in the New York area where there's a lot of air traffic.
NATHAN: Yeah. I've done a lot more; I've done most of my flying in California, but I have done some flying in New York. My most memorable flight was, I flew down the Hudson River at 1,500 feet, which was definitely one of the most thrilling experiences of my life and also one of the most terrifying because that airspace is crazy. There are helicopters and aircraft all over the place. But, it was really quite an experience.
CRAIG: Hmm, that's very cool.
CRAIG: All right, well, I know you and I are both a bit aviation nuts. Maybe we should not drag everyone else into it too much longer, although I found--I think other people will as well--your experience fascinating.
CRAIG: We are starting to wind up here, though. I do always make sure that we give our guests time to talk about anything else that is on their mind that they would like to share with me or with the rest of our listeners. We have plenty of time to do that. Is there anything else on your mind? If it's a longer thing and you want to save it for another day, I would love to have you back on. Obviously, we look at your résumé and it's pretty clear that you do cool things on a regular basis, and I have no doubt that there will be more for us to talk about at a future date. Today, is there anything else you'd like to spend time on?
NATHAN: No, I think we've covered a lot of stuff. I hope that was interesting to all the listeners.
CRAIG: I'm sure it was, actually. It definitely was interesting to me. Well, this is great. This is a good conversation. We have our one question at the end. I still have one more bit of information or illumination I'd like to get from you, and that's our final question, which is a bit of advice. We always ask our guests to share a bit of advice with our listeners, whether that's advice they've received or advice they like to give, or just anything, really. What would you like to share with us in terms of advice?
NATHAN: Well, I did mention -- I mentioned it a little bit already, but I do this thing where I do one new thing every year, something to challenge me that will require a lot of effort. That has been one of the best things. Just that, I guess, yearly tradition for me has definitely been one of the best things I've done.
I find that there are kind of multiple aspects to it. First of all, I do think it's really important to always challenge yourself. It's through challenging yourself and putting yourself into situations that you're not comfortable with, that's how you grow, and that's how you improve as a person.
It also humbles you, like when I was flying. I was doing really well. I was learning really fast. Then when it came time to learn how to land, I was just stuck for a whole month. I just could not land the damn plane. As someone where I've been very, very successful in my career, with companies, and building these open source projects, it's good to have a humbling experience like that where you're trying to do something and you just have no idea why you can't get it.
Then another aspect of it is that the more you diversify yourself in terms of the things you do and the kind of knowledge you have, there's an amazing bleeding effect between knowledge. Like, there are things I learned from learning how to fly a plane, which has bled over to my understanding of software engineering. That's been true of everything I've done.
I remember I was on this big history bent at one point with Colonial American History. I was reading biographies of all the Founding Fathers and things like that. You learn things by exploring these subjects, which are so different from your career, which surprisingly is useful. I learned some things like how George Washington managed his Cabinet, which I think is very relevant and very useful to just dealing with people in general and also with managing a software team.
There's just so much to gain by forcing yourself to explore new things and challenge yourself. That would be my one piece of advice.
CRAIG: Wow. That's awesome. I hope you don't mind. That's usually our final thing, but I've got to ask you about some of the other things that you did on your yearly. I don't know if you said at the beginning of the year or, if so, maybe it's coming time for you to pick another one, but I'd just love to hear a few. We don't have to go into them too much, but what does the list look like? Obviously fly a plane is one of them, but what other crazy things have you made yourself do?
NATHAN: Let's see. Two years ago or three years ago, I forced myself to do standup comedy.
CRAIG: Oh, wow!
NATHAN: Yeah. That is challenging. Yeah. I forced myself not just to do it, but to get decent at it. I'm no Louis C.K. or anything like that, but I was able to go on stage to a bunch of strangers and get a lot of good laughs.
NATHAN: That was quite the experience. I don't know if I would recommend that to everyone because standup is brutal.
CRAIG: Yeah, I've heard - I've heard.
NATHAN: Yeah. Yeah. Yeah, on some level, having that experience of just bombing on stage for five minutes straight would be a good experience for a lot of people to have. But, yeah, you've got to develop thick skin to be a standup comedian, no matter how good you are.
NATHAN: But that's one of the ways in which it helps you grow. My thing that I'm going to do next year is I'm planning. You know, I've spent all my life doing software and working with abstract ideas, and so I thought it would be fun to enter the real world of making things, and so next year I'm going to learn how to do woodworking.
CRAIG: Oh, great.
NATHAN: Yeah, and I do expect, again, there to be a lot of that bleed over effect of learning things about building real things, which then help me understand software or other things even more.
CRAIG: Well, this is great. I'm glad you mention that because it gives me a chance to reciprocate. I'm not expert, but I have some experience with woodworking. It sounds like maybe you're just getting started. My friend Tim Ewald, another Cognitect, is also a great woodworker. If you ever have any questions or just want to talk about it some time, please hit me up. It's no ride in a plane, but I'd be happy to offer what little perspective I have, so give me a shout.
NATHAN: Yeah, that'd be great. I've just started looking into it and planning for it, but, yeah, that'd be great.
CRAIG: Cool. Maybe we'll convince you to use hand tools. That's been our thing, so anyway. All right, well, that is amazing. I'm so glad. I know it kind of messed up our usual order, but I don't care because standup comedy, woodworking, that's fantastic. I suspect we could probably just have you on every year and say, "Hey, man. How was your new project?" and get some great stories from you. Then, maybe, just as a bonus, talk about whatever awesome software you've created, so I'm really glad we had you on today.
NATHAN: Sounds good to me.
CRAIG: Oh, well, let's do it, then. It'd be fun. It's been great having you on. I super appreciate you taking the time. Obviously you're a very busy person, but I do appreciate you coming on. I was absolutely fascinated by the conversation. I knew, after the conversation we had at Strange Loop, you'd make a great guest, and I did not get disappointed in that, so thanks a ton for coming on the show.
NATHAN: Yeah, that was a lot of fun. Thanks for having me.
CRAIG: Likewise. We're glad you could make it. But, we do have to wrap up there. We will call it a day. This has been the Cognicast.
[Music: "Thumbs Up (for Rock N' Roll)" by Kill the Noise and Feed Me]
CRAIG: You have been listening to the Cognicast. The Cognicast is a production of Cognitect, Inc., whom you can find on the Web at cognitect.com and on Twitter, @Cognitect. You can subscribe to the Cognicast, listen to past episodes, and view cover art and show notes at our home on the Web, cognitect.com/cognicast. You can contact us by tweeting @Cognicast or by emailing us at firstname.lastname@example.org.
Our guest today was Nathan Marz, on Twitter @nathanmarz. Episode cover art is by Michael Parenteau. Audio production by Russ Olsen. The Cognicast is produced by Kim Foster. Our theme music is Thumbs Up (for Rock N' Roll) by Kill the Noise with Feed Me. I'm your host, Craig Andera. Thanks for listening.