Reading binary data in Clojure

January 29, 2009

A few months ago, I wrote a Python library to parse Starcraft replay files. For the past few weeks, I’ve been meaning to try writing a Clojure implementation and I finally got the time to start working on that this week. In this post, I’d like to show how I tackled the problem of reading binary data in what I find to be an elegant manner.

Because Clojure can use all the Java libraries, I figured that it would probably be a good idea to use the java.nio package. A problem that quickly rose was that calling the methods manually was both ugly and unnecessarily verbose. Here’s an example of what it looked like:

(let [game-engine (.get buf)
      game-frames (.getInt buf)
      _ (.get buf (make-array Byte/TYPE 3)
      save-time (Date. (long * 1000 (.getInt buf))))
      _ (.get buf (make-array Byte/TYPE 3))
      ; etc
      ])

The code wasn’t “symmetric” enough for my taste between different fields, dealing with fields that needed more than one byte/word/dword of data was just plain ugly (not shown in the code above). It was clear that I needed something more declarative. What I wanted was to list the different fields, give their name, length, data type and an optional function to execute on the read value (for example, to convert a long into a Date). The first step I took was to actually write the code I wanted and then I’d worry about making it work. Here’s how I can read the header data of a replay file:

(defn parse-headers
  [buf]
  (parse-buffer buf
    [:game-engine         1 :byte]
    [:game-frames         1 :dword]
    [nil                  3 :byte]
    [:save-time           1 :dword #(Date. (long (* 1000 %)))]
    [nil                 12 :byte]
    [:game-name          28 :string]
    [:map-width           1 :word]
    [:map-height          1 :word]
    [nil                 16 :byte]
    [:creator-name       24 :string]
    [nil                  1 :byte]
    [:map-name           26 :string]
    [nil                 38 :byte]
    [:players-data      432 :byte parse-players-data]
    [:player-spot-color   8 :dword]
    [:player-spot-index   8 :byte]))

With this “specification” in hand, I wrote the code to make it work.

(defn read-field
  [buf n type]

  (defn null-string
    "Read a nul-terminated string. Stop at  or at
     length n, whichever comes first."
    [buf n]
    (let [bytes (doall (for [_ (range n)] (char (.get buf))))]
      (apply str (take-while #(not= % \u0000) bytes))))

  (defn read-field-aux
    "Read n data and return it as a vector if n is greater than 1,
     as a vector otherwise"
    [n type]
    (let [f ({:byte  (memfn get)
              :word  (memfn getShort)
              :dword (memfn getInt)} type)
          vec (into [] (for [_ (range n)] (f buf)))]
      (if (= n 1)
        (first vec)
        vec)))

  (cond (= type :string) (null-string buf n)
        (some #{type} [:byte :word :dword]) (read-field-aux n type)))

(defn parse-buffer
  "A v-form is a vector of the form: [:field-name length :type func?]
   Each v-form is read from buf and the whole data is return as a map
   If a field-name is nil, the data is not returned (but the field is
   read nonetheless to move forward into the buffer."
  [buf & v-forms]
  (apply
   hash-map
   (mapcat (fn [[field-name size type func]]
             (let [data (read-field buf size type)]
               (if (nil? field-name)
                 nil
                 [field-name (if func
                               (func data)
                               data)])))
           v-forms)))

And here’s the data from a Starcraft replay file after it’s been read:

{:game-name "MBC_Sea[Shield]",
 :game-engine 1,
 :map-width 128,
 :players-data ({:name "MBC_Sea[Shield]", :race "Terran", :player-number 0, :type :human, :slot-number 0}
                {:name "", :race nil, :player-number 1, :type nil, :slot-number -1}
                {:name "CJ sAviOr", :race "Zerg", :player-number 2, :type :human, :slot-number 1}
                {:name "", :race nil, :player-number 3, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 4, :type nil, :slot-number -1}
                {:name "", :race "Protoss", :player-number 5, :type nil, :slot-number -1}
                {:name "", :race "Terran", :player-number 6, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 7, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 8, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 9, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 10, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 11, :type nil, :slot-number -1}),
 :save-time #<Date Sat Jun 28 15:05:16 EDT 2008>,
 :player-spot-index [1 1 1 1 0 0 0 0],
 :game-frames 45149,
 :player-spot-color [4 1 7 0 2 6 3 5],
 :creator-name "MBC_Sea[Shield]",
 :map-height 128,
 :map-name "Andromeda 1.0"}

Response to: Problems with Lisp

January 25, 2009

Jonathan Rockway wrote in a recent blog post that he does not understand why some people are unhappy with Common Lisp’s hash tables and demonstrates that they are not that different to use from lists or vectors. His post missed the point in my opinion, because he looks at individual operations and not at how they fit inside the language. I’ll try to explain some of the problems with CL’s hash tables as I see them.

Syntax

One of the first complaint people have with CL’s hash tables is that they have no reader syntax. While this may not seem like a big problem, if a concept has an accessible syntax, people are more likely to use it more often. Most people wouldn’t argue that passing a custom comparison function to sort is easier in a language that has function literals than in a language that does not, such as C.

The same is true with hash tables; a programmer is much more likely to use them if the “ceremony” for using them is kept to a minimum.

# Would you rather type this?
oldest({"Vincent": 25, "Lincoln": 200})

# Or this?
(let ((ages (make-hash-table :test 'equal)))
  (setf (gethash "Vincent" ages) 25)
  (setf (gethash "Lincoln" ages) 200)
  (oldest age))

Structure

When you learn Lisp, you are quickly introduced to cons cells, which are used to create lists. The tutorials show you how elegantly you can create functions by recursively calling cons to build up a new list. This nice, recursive structure does not exist with hash tables however; instead of writing pure, side effects-free functions like you can with lists, you find yourself mutating the hash tables to accomplish your task. This is similar to what you find in most programming languages, which might be why Mr. Rockway does not see this as a problem.

However, in languages such as Clojure or Haskell, hash tables are immutable and the programmer can work with them similarly to linked lists: adding an item to the table does not modify it, it returns a new table with the new pair in it. The idioms of recursion that you learned with lists can be transposed easily.

; Clojure code
(defn map-hash [fn table]
  (when (seq table)
    (assoc (map-hash fn (rest table))
           (key (first table))
           (fn (val (first table))))))

(map-hash #(+ % 3) {:a 1, :b 2, :c 3})
  => {:a 4, :b 5, :c 6}

I don’t think that people think that hash tables are “bad” in Common Lisp; I think most people find them inconvenient to use.


Emacs function for Clojure users

January 24, 2009

Today, I wrote an Elisp function to easily add paths to be included in the -classpath command line parameter of java when launching the Clojure SLIME REPL. The command is created by the function swank-clojure-cmd, which is most usually ran when Emacs launches and its result is stored inside the slime-lisp-implementations variable.

Adding paths with the -classpath flag is the Clojure recommended way to include extra paths or jars instead of using the add-classpath function. The following function takes a path argument and will properly create a new Clojure entry in slime-lisp-implementations. To use it, use M-x clojure-add-classpath.

(defun clojure-add-classpath (path)
  "Add a classpath to Clojure and refresh slime-lisp-implementations"
  (interactive "GPath: ")
  (push path swank-clojure-extra-classpaths)
  (setq slime-lisp-implementations
        (cons `(clojure ,(swank-clojure-cmd) :init swank-clojure-init)
              (remove-if #'(lambda (x) (eq (car x) 'clojure)) slime-lisp-implementations))))