Journal 2019.7 - jdk regression, spec, sets, tools.deps

JDK regression

There was a rash of reports this week about significant startup time slowdowns after switching to the latest version of Java 8 and 11. I tried to help on the first couple but by the time I saw the fourth report, it seemed like this needed a bit more focus. While I did a lot of looking, I can’t say much of that actually helped. But big thanks to Kevin Downey for best distilling the initial reports and Shantanu Kumar for finding the salient description buried in another mailing list.

I’ve summarized the issue in CLJ-2484 for future tracking and made a simpler repro here.

In short, Java 8u201 and 11.0.2 include a fix for a 0-day security issue that can occur during static initialization. The fix prevents optimizations during the static initializer so you’re running more at interpreter speeds. Clojure loads user.clj during the static initializer for clojure.lang.RT. Thus, anyone using the reloaded pattern and loading your app state via user.clj is likely to see substantially slower startup times.

In general, Clojure gains a tremendous amount of stability, performance, and value from sitting on hosted platforms. Sometimes that relationship is complicated. It is too early yet to say what, if anything, we’ll do in Clojure about this. I’ve thought for a long time that we load user.clj awfully early in bootstrapping and this is yet another hint that maybe it should happen a little later.

I really don’t want to suggest not using the latest Java versions - they contain critical security fixes and I’d still recommend them. However, you may want to reconsider your use of user.clj and how you set up your environment in dev.

spec

I’ve been working on the new select functionality for spec 2 this week, although due to other work and time offline, and other things, did not finish it. An initial version should be done relatively soon though.

Sorted sets

I had a couple of extended Twitter conversations this week which were frustrating in being unable to actually express what I want to express in the medium. Given that, I’m just going to expound in long form here.

Nikita Prokopov has been hard at work improving the performance of DataScript and it looks like he’s been making some great progress. He mentioned on Twitter that perhaps his sorted set implementation could replace Clojure’s. I can’t answer that, but I can elaborate a bit on what we would need to consider to do so.

Clojure’s persistent collection implementations are at the heart of the language. They were literally the first part of Clojure that Rich worked on to determine whether his ideas for Clojure were viable. Every part of Clojure relies on these persistent immutable data structures. Given the risk involved with modification or replacement of these data structures, it should be no surprise that we would set a very high bar in considering a change.

Some of the things we would need to evaluate include:

quality of the code, which would need to be closely reviewed
hashing / equality
performance complexity of all operations
thread safety and concurrency
transient use - the current sorted colls don’t support transients where I believe Nikita’s do. There are several places (like into) that have conditional code paths for colls that can be transients. These would need to be evaluated and tested as we would be using transients automatically in many places we do not currently.
performance. I know Nikita has done a lot of perf testing, but I’ve burned myself enough times to be skeptical of all test results until the benchmark code has also been thoroughly reviewed.
memory footprint and GC

While the persistent collections are central to the Clojure story, they are also surrounded by well-defined collection interfaces (see this post) and the core libraries, compiler, etc deal almost entirely with the abstractions, not the concrete collections. This means it is relatively easy to instantiate and use your own collection implementation in Clojure without changing anything in Clojure itself. Indeed, there are a bunch of collections in Clojure contrib and elsewhere with a variety of performance characteristics and special features.

To consider this contribution, it would be necessary for us to methodically go through the points above ensuring that this is the best possible solution. Inevitably, we would ideally also want to identify potential problem areas with the current impl (no transients, for example!) and do some research on what some alternatives are and the pros and cons of each. Even starting with a patch in hand, this is weeks of work to fully evaluate everything. Things that would make it go faster are having really good descriptions or answers on the items above.

The meta-game is always, where do we spend our time? Of the many big things we might work on, how do we pick the one with the highest impact? What percentage of the user base might be affected by working on it? Does it bring new people in or make existing users more productive? Is there a way for people to get the use of something in a lib instead of core (in this case, yes!)?

tools.deps / clj

There was also some discussion about why we’ve invested time in tools.deps and the clj tool. There are really a set of related things that led us to this as a thing to work on.

Clojure was designed and always has been a source-focused language where code is compiled on load. In general, Clojure projects don’t need a build or compile step - if you can get to the source file, you can load and run it in Clojure. Leiningen was a transformative tool for Clojure because it opened up the Maven repository ecosystem, making it easy to leverage Java libraries. But Java code requires a compile and build step and is inherently an artifact-based world, not a source-based world. We gained a lot, but we lost something important.
Code evolution over time is something we continue to think about a lot in many areas, including deps. Version selection is inherently involved in that process and it’s difficult to be in the loop on that if you’re using the Maven resolver.
And finally, we wanted a library and a tool focused only on obtaining deps (specified as data), building classpaths, and launching programs. Leiningen and Maven both have a notion of a pre-determined build lifecycle. That’s great for getting started quickly with a standard project layout and again, Leiningen was a game-changer at a critical point in the Clojure community. But over time projects tend to grow custom steps and unique needs. Plugins and profiles help with this but can get out of hand over time. Eventually, you just want to imperatively state what the build should do. Boot provided some interesting answers to this but also made the dependency definition part of the build program, rather than data the program is based on. clj is a different path - deps as data and build tasks as small Clojure programs.

It’s been gratifying to see many people finding clj to be useful, particularly those just getting started with Clojure. There are a lot of great tools that can be composed with clj now. If the clj model isn’t a match for your project, please use Leiningen, or Boot, or whatever tool is good for you. For example, tools.deps and clj are still built and deployed primarily with Maven.

We have a bunch of tools.deps issues that need work and once things cool off with spec I expect we’ll work through some of those.

Other stuff I enjoyed this week…

I’m kind of a sucker for climbing movies for some reason and finally had a chance to see Free Solo this week. Free Solo chronicles Alex Honnold’s successful attempt to free solo El Capitan in Yosemite, which continues to boggle my mind. I would also recommend The Dawn Wall for another epic Yosemite climbing movie.

And check out the new tune by the Budos Band - Old Engine Oil, which may or may not be the music that should play when I walk into a room.

Written on February 16, 2019