I originally wrote this in a Google Groups thread, but I figured it’s worth repeating here.
Somebody posted a Java and Clojure snippet to the Clojure Google group and mentioned that the Java code was vastly faster than the Clojure code and he wondered if Clojure could get within reach of Java’s speed.
In my own clj-starcraft project, I faced — and actually, still face — performance problems vis-à-vis Java. Specifically, at the time of this writing, my Clojure code is roughly 6 times slower than Java (Clojure takes around 70 seconds to parse 1,050 files, Java takes 12.)
The 70 seconds figure used to be much worse however. At the beginning of the project, it took over ten minutes to analyze my 1,050 files. That was even slower than my Python implementation (which, I must confess, pretty much ignored performance.)
Thanks to Java’s nice profilers and the friendly Clojure folks, I was able to improve the performance of my program. Here are some of the things I did.
(set! *warn-on-reflection* true)
This is likely the most important thing you can do to improve performance: turning on this setting will warn you of every place where the Java reflection API is used to resolve methods and attributes. As you may imagine, a direct call is a lot faster than going through all that reflection machinery. Wherever Clojure tells you that it can’t resolve a method, you need to go there and put a type hint to avoid the reflection call. The type hints section of the Clojure website shows an example of the speed-up and how to use type hints.
Fixing all the instances where
*warn-on-reflection* complained improved the performance of clj-starcraft from ten minutes down to about three and half.
Coerce your numbers
Clojure can use Java’s primitive types. Whenever you are in a tight loop, strongly consider coercing your values to primitive types to get the most speed out of your code. The primitive types section of the Clojure website shows an example of the speed-up and how to do the coercion.
Use binary arithmetic operators
For a while now, Clojure has supported inlining of certain expressions. For arithmetic operations, only the calls with exactly two arguments will be inlined. If you find yourself doing arithmetic in a tight loop with more than two operands, you may want to consider rewriting the code so that every operator has exactly two operands. The following micro-benchmark shows the effect of inlining.
user> (time (dotimes [_ 1e7] (+ 2 4 5))) "Elapsed time: 1200.703487 msecs" user> (time (dotimes [_ 1e7] (+ 2 (+ 4 5)))) "Elapsed time: 241.716554 msecs"
Use == instead of =
Using == to compare numbers instead of = can have an appreciable performance impact:
user> (time (dotimes [i 1e7] (= i i))) "Elapsed time: 230.797482 msecs" user> (time (dotimes [i 1e7] (== i i))) "Elapsed time: 5.143681 msecs"
Avoid using destructuring binding for vectors
In a tight loop, if you want to assign the values inside a vector to names to improve readability, consider using direct indexing instead of destructuring binding. Although the later will yield clearer code, it will also be slower.
user> (let [v [1 2 3]] (time (dotimes [_ 1e7] (let [[a b c] v] a b c)))) "Elapsed time: 537.239895 msecs" user> (let [v [1 2 3]] (time (dotimes [_ 1e7] (let [a (v 0) b (v 1) c (v 2)] a b c)))) "Elapsed time: 12.072122 msecs"
Prefer locals to vars
If you need to lookup a value inside a tight loop, you may want to consider using a local variable (defined with
let) instead of a Var. Let’s look at another timing comparison:
user> (time (do (def x 1) (dotimes [_ 1e8] x))) "Elapsed time: 372.373304 msecs" user> (time (let [x 1] (dotimes [_ 1e8] x))) "Elapsed time: 3.479041 msecs"
If you think that using a local variable would help performance, consider the following idiom to avoid breaking other code that may depend on that Var:
(let [local-x x] (defn my-fn [a b c] ...))
Use the profilers
Sun’s JVM has two profilers,
-Xrunhprof. Use them to find your bottlenecks instead of blindly guessing.
Do note that many of these performance tricks improve the performance by a few hundred milliseconds over millions of executions. Unless they give you a significant performance boost, you may want to avoid using them if they make your code less clear.