Clojure performance tips

May 11, 2009

I originally wrote this in a Google Groups thread, but I figured it’s worth repeating here.

Somebody posted a Java and Clojure snippet to the Clojure Google group and mentioned that the Java code was vastly faster than the Clojure code and he wondered if Clojure could get within reach of Java’s speed.

In my own clj-starcraft project, I faced — and actually, still face — performance problems vis-à-vis Java. Specifically, at the time of this writing, my Clojure code is roughly 6 times slower than Java (Clojure takes around 70 seconds to parse 1,050 files, Java takes 12.)

The 70 seconds figure used to be much worse however. At the beginning of the project, it took over ten minutes to analyze my 1,050 files. That was even slower than my Python implementation (which, I must confess, pretty much ignored performance.)

Thanks to Java’s nice profilers and the friendly Clojure folks, I was able to improve the performance of my program. Here are some of the things I did.

(set! *warn-on-reflection* true)

This is likely the most important thing you can do to improve performance: turning on this setting will warn you of every place where the Java reflection API is used to resolve methods and attributes. As you may imagine, a direct call is a lot faster than going through all that reflection machinery. Wherever Clojure tells you that it can’t resolve a method, you need to go there and put a type hint to avoid the reflection call. The type hints section of the Clojure website shows an example of the speed-up and how to use type hints.

Fixing all the instances where *warn-on-reflection* complained improved the performance of clj-starcraft from ten minutes down to about three and half.

Coerce your numbers

Clojure can use Java’s primitive types. Whenever you are in a tight loop, strongly consider coercing your values to primitive types to get the most speed out of your code. The primitive types section of the Clojure website shows an example of the speed-up and how to do the coercion.

Use binary arithmetic operators

For a while now, Clojure has supported inlining of certain expressions. For arithmetic operations, only the calls with exactly two arguments will be inlined. If you find yourself doing arithmetic in a tight loop with more than two operands, you may want to consider rewriting the code so that every operator has exactly two operands. The following micro-benchmark shows the effect of inlining.

user> (time (dotimes [_ 1e7] (+ 2 4 5)))
"Elapsed time: 1200.703487 msecs"

user> (time (dotimes [_ 1e7] (+ 2 (+ 4 5))))
"Elapsed time: 241.716554 msecs"

Use == instead of =

Using == to compare numbers instead of = can have an appreciable performance impact:

user> (time (dotimes [i 1e7] (= i i)))
"Elapsed time: 230.797482 msecs"

user> (time (dotimes [i 1e7] (== i i)))
"Elapsed time: 5.143681 msecs"

Avoid using destructuring binding for vectors

In a tight loop, if you want to assign the values inside a vector to names to improve readability, consider using direct indexing instead of destructuring binding. Although the later will yield clearer code, it will also be slower.

user> (let [v [1 2 3]]
        (time
         (dotimes [_ 1e7]
           (let [[a b c] v]
             a b c))))
"Elapsed time: 537.239895 msecs"

user> (let [v [1 2 3]]
        (time
         (dotimes [_ 1e7]
           (let [a (v 0)
                 b (v 1)
                 c (v 2)]
             a b c))))
"Elapsed time: 12.072122 msecs"

Prefer locals to vars

If you need to lookup a value inside a tight loop, you may want to consider using a local variable (defined with let) instead of a Var. Let’s look at another timing comparison:

user> (time
       (do
         (def x 1)
         (dotimes [_ 1e8]
           x)))
"Elapsed time: 372.373304 msecs"

user> (time
       (let [x 1]
         (dotimes [_ 1e8]
           x)))
"Elapsed time: 3.479041 msecs"

If you think that using a local variable would help performance, consider the following idiom to avoid breaking other code that may depend on that Var:

(let [local-x x]
  (defn my-fn [a b c]
    ...))

Use the profilers

Sun’s JVM has two profilers, -Xprof and -Xrunhprof. Use them to find your bottlenecks instead of blindly guessing.

Caveat

Do note that many of these performance tricks improve the performance by a few hundred milliseconds over millions of executions. Unless they give you a significant performance boost, you may want to avoid using them if they make your code less clear.


irssi softignore script

May 8, 2009

A few weeks ago, my friend Steven wrote about a softignore script he wrote for X-Chat. The idea is that instead of completely blocking what a user says, the script captures their text and colors it in a way that the text “blends” with the background, so you can more easily ignore what these people say.

I’m an irssi fan myself, but I was a bit jealous of his script and quite surprised not to find anything like it in irssi’s script repository. So I decided to go ahead and write my own. Irssi is scripted in Perl, of which I have a little bit of knowledge, but definitely no expertise. The documentation on how to script Irssi is a bit thin; you can find a little on their web site, a little on the site of a guy who wrote a mini-tutorial and a little on the #irssi channel.

After a couple hours of hacking, I had a script that did what I wanted. There are three commands, /softignore_add, /softignore_remove and /softignore_list. The parameter to the add and remove commands is a regular expression to match the nickname. The list is kept in memory as well as in a plain text file with one regex string per line. I imagine that the more advanced Perl programmers will suggest better ways to accomplish this, but for the time being, this works well enough.

I have not made the color configurable yet, so if you need a color other than dark grey, just edit the script file and change the value of the $COLOR variable.

I implemented a few functions (slurp, spit and trim) that are most likely available in CPAN modules; however I did not want my simple script to have external dependencies, so I knowingly violated the DRY principle.

Here is the code for the script. Just copy it in ~/.irssi/scripts/ and use the /script load softignore command to load it.

use strict;
use warnings;

use Irssi qw(command_bind signal_add signal_continue command);

# Irssi globals
our $VERSION = "0.00";
our %IRSSI = (
    authors     => "Vincent Foley",
    contact     => "vfoleybourgon at yahoo dot ca",
    name        => "Soft Ignore",
    description => "Ignore users by putting their messages in a dim color.",
    license     => "MIT",
);

# Soft ignore globals
our $FILENAME = "$ENV{HOME}/.irssi/saved_softignore";
our $COLOR    = 14;
our @ignores  = slurp($FILENAME);

sub slurp {
    my ($filename) = @_;

    my $lines;
    {
        local $/;
        open(my $fd, '<', $filename) or return ();
        $lines = <$fd>;
    }
    return split(/\n/, $lines);
}

sub spit {
    my ($filename, @lines) = @_;

    open(my $fd, '>', $filename) or return;
    for my $line (@lines) {
        print {$fd} $line."\n";
    }
    close($fd);
}

sub trim {
    my ($str) = @_;

    $str =~ s/^\s*(\S+)\s*$/$1/;
    return $str;
}

sub softignore_add {
    my ($data, $server, $witem) = @_;

    push(@ignores, trim($data));
    spit($FILENAME, @ignores);
}

sub softignore_remove {
    my ($data, $server, $witem) = @_;

    @ignores = grep { $_ ne trim($data) } @ignores;
    spit($FILENAME, @ignores);
}

sub softignore_list {
    my ($data, $server, $witem) = @_;

    return unless $witem;

    if (@ignores == 0) {
        $witem->print("No soft ignore entries.");
    }
    else {
        $witem->print("Soft ignore list:");
        for my $ignore (@ignores) {
            $witem->print("* " . $ignore);
        }
    }
}

sub softignore_message {
    my ($server, $msg, $nick, $address, $target) = @_;

    for my $ignore (@ignores) {
        if ($nick =~ /$ignore/i) {
            signal_continue($server, "03$COLOR$msg",
                            $nick, $address, $target);
        }
    }
}

command_bind softignore_list   => \&softignore_list;
command_bind softignore_add    => \&softignore_add;
command_bind softignore_remove => \&softignore_remove;

signal_add "message public" => \&softignore_message;

Naruto Fetcher Jutsu!

May 6, 2009

A few months ago, a friend told me about “Naruto”, a japanese manga from which two TV series were made. I now hate this friend, because I was completely hooked by the story and wasted many hours that could’ve been spent more productively :) I watched nearly all of the dubbed episodes of Naruto (I got tired of the later filler episodes) as well as the Shippuden episodes.

The current story arc in Shippuden is also a filler arc that seems to be going nowhere, so I decided to get the mangas to read the canon story. There are, as of this writing, 445 mangas. That’s a lot of mangas to download by hand! (I could read them online, but I found the picture quality to be severly lacking.) Being a lazy programmer and all, instead of clicking individually on every link, I decided to write a script to fetch them all. I could’ve simply used a for loop, but I wanted to download many archives in parallel to make the process go faster, so I wrote the script in Python and I used the new multiprocessing module to make the parallelism easy (trivial, even!)

This is an extremely simple script, nothing fancy, but I give it to you anyway:

from multiprocessing import Pool
import os

def get(n):
    os.system('wget -q "http://www.narutochuushin.com/downloads/script/downloads.php?title=manga_chapter%03d"' % n)

pool = Pool(10)
pool.map(get, range(1, 446))
pool.close()

Hope you enjoy!


Explaining the null debate

May 5, 2009

A few weeks ago there was a discussion on an IRC channel I hang in about null references. That was at the time of QConLondon when Sir Tony Hoare was giving a keynote presentation calling null references his billion dollar mistake. We talked about null for about an hour before we all gave up after realizing that it was never going to get anywhere.

I like IRC, but for long discussions, especially those involving many parties, it’s not the best medium out there. This blog post is an attempt to concisely explain the problem with null.

First, what’s null? Null is a way in many languages to convey the idea of “nothingness”. The concept of nothingness is an important one in computer science, and hardly anyone is suggesting that we do without it. This is not what people are arguing about.

The debate is over whether nothingness should be explicit or implicit.

In a language like Java, nothingness is implicit. A method that expects a String parameter can also be passed null. A method that expects an array of doubles can also be passed null. We can think of null as a member of every reference type in Java. However, its semantics are different from every other value of that type; calling a method or looking up an attribute on null results in a NullPointerException. This means that special care has to be taken when dealing with null. And because the compiler can’t know whether we are calling .length() on null or a valid string, it cannot prevent us from doing bad things. The responsibility of safe code rests entirely on the programmer’s shoulders.

On the other hand, in a language like Haskell, nothingness is explicit. Haskell has a type called Maybe a (this could be written Maybe<t> in Java) which has exactly two values: Nothing, which is the value you use to describe the absence of a value and Just x, which we can think of as a box containing the actual value. (The value needs to be “pulled out” of the box to be used normally.) The type Maybe String (a concrete instance of Maybe a) cannot be used where a String is expected, and vice-versa; you cannot call the function length on Nothing, the type checker will not allow it. This eliminates the NullPointerException problem. Furthermore, because Haskell knows that Maybe a has two values, it can warn you when it detects that you forgot to handle either case in a function. Haskell can actually help us avoid making errors.