Reading binary data in Clojure

A few months ago, I wrote a Python library to parse Starcraft replay files. For the past few weeks, I’ve been meaning to try writing a Clojure implementation and I finally got the time to start working on that this week. In this post, I’d like to show how I tackled the problem of reading binary data in what I find to be an elegant manner.

Because Clojure can use all the Java libraries, I figured that it would probably be a good idea to use the java.nio package. A problem that quickly rose was that calling the methods manually was both ugly and unnecessarily verbose. Here’s an example of what it looked like:

(let [game-engine (.get buf)
      game-frames (.getInt buf)
      _ (.get buf (make-array Byte/TYPE 3)
      save-time (Date. (long * 1000 (.getInt buf))))
      _ (.get buf (make-array Byte/TYPE 3))
      ; etc
      ])

The code wasn’t “symmetric” enough for my taste between different fields, dealing with fields that needed more than one byte/word/dword of data was just plain ugly (not shown in the code above). It was clear that I needed something more declarative. What I wanted was to list the different fields, give their name, length, data type and an optional function to execute on the read value (for example, to convert a long into a Date). The first step I took was to actually write the code I wanted and then I’d worry about making it work. Here’s how I can read the header data of a replay file:

(defn parse-headers
  [buf]
  (parse-buffer buf
    [:game-engine         1 :byte]
    [:game-frames         1 :dword]
    [nil                  3 :byte]
    [:save-time           1 :dword #(Date. (long (* 1000 %)))]
    [nil                 12 :byte]
    [:game-name          28 :string]
    [:map-width           1 :word]
    [:map-height          1 :word]
    [nil                 16 :byte]
    [:creator-name       24 :string]
    [nil                  1 :byte]
    [:map-name           26 :string]
    [nil                 38 :byte]
    [:players-data      432 :byte parse-players-data]
    [:player-spot-color   8 :dword]
    [:player-spot-index   8 :byte]))

With this “specification” in hand, I wrote the code to make it work.

(defn read-field
  [buf n type]

  (defn null-string
    "Read a nul-terminated string. Stop at  or at
     length n, whichever comes first."
    [buf n]
    (let [bytes (doall (for [_ (range n)] (char (.get buf))))]
      (apply str (take-while #(not= % \u0000) bytes))))

  (defn read-field-aux
    "Read n data and return it as a vector if n is greater than 1,
     as a vector otherwise"
    [n type]
    (let [f ({:byte  (memfn get)
              :word  (memfn getShort)
              :dword (memfn getInt)} type)
          vec (into [] (for [_ (range n)] (f buf)))]
      (if (= n 1)
        (first vec)
        vec)))
  
  (cond (= type :string) (null-string buf n)
        (some #{type} [:byte :word :dword]) (read-field-aux n type)))


(defn parse-buffer
  "A v-form is a vector of the form: [:field-name length :type func?]
   Each v-form is read from buf and the whole data is return as a map
   If a field-name is nil, the data is not returned (but the field is
   read nonetheless to move forward into the buffer."
  [buf & v-forms]
  (apply
   hash-map
   (mapcat (fn [[field-name size type func]]
             (let [data (read-field buf size type)]
               (if (nil? field-name)
                 nil
                 [field-name (if func
                               (func data)
                               data)])))
           v-forms)))

And here’s the data from a Starcraft replay file after it’s been read:

{:game-name "MBC_Sea[Shield]",
 :game-engine 1,
 :map-width 128,
 :players-data ({:name "MBC_Sea[Shield]", :race "Terran", :player-number 0, :type :human, :slot-number 0}
                {:name "", :race nil, :player-number 1, :type nil, :slot-number -1}
                {:name "CJ sAviOr", :race "Zerg", :player-number 2, :type :human, :slot-number 1}
                {:name "", :race nil, :player-number 3, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 4, :type nil, :slot-number -1}
                {:name "", :race "Protoss", :player-number 5, :type nil, :slot-number -1}
                {:name "", :race "Terran", :player-number 6, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 7, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 8, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 9, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 10, :type nil, :slot-number -1}
                {:name "", :race "Zerg", :player-number 11, :type nil, :slot-number -1}),
 :save-time #<Date Sat Jun 28 15:05:16 EDT 2008>,
 :player-spot-index [1 1 1 1 0 0 0 0],
 :game-frames 45149,
 :player-spot-color [4 1 7 0 2 6 3 5],
 :creator-name "MBC_Sea[Shield]",
 :map-height 128,
 :map-name "Andromeda 1.0"}
About these ads

15 thoughts on “Reading binary data in Clojure

  1. Is there a rationale behind nesting defns in read-field? That looks like bad style. It seems that null-string and read-field-aux would work just as well if defined at the top level, or as local fns, i.e. (let [null-string (fn [buf n] …) …] …).

    read-field looks like a candidate for multimethods — have you considered expressing it that way?

    :-)

  2. > Why would a defn be worse style than using fn?

    That’s a good question. It irks me, too, but it’s hard to put a finger on exactly *why*. Combining multiple expressions like that just isn’t how I learned Lisp or functional programming generally. (It’s like wrapping your function definition in a begin block in Scheme.)

  3. I had the same idea, that defn is similar to define in Scheme and establishes the same lexical binding behaviour.

    Then I came across was a thread on Clojure Google Groups (can’t seem to find it right now) about it that changed this assumption.

    All def and defn declarations end up being top level, even when declared inside of other defn statements:

    (defn fa []
    (defn fb [] “function b”)
    “function a”)

    (defn fc []
    (fb))

    user=> (fa)
    “function a”

    user=> (fc)
    “function b”

    Best regards,

    Telman

  4. As abhishek pointed out, a defn within another function just doesn’t seem right.

    My reasons for this:
    – It’s not being used as a closure so it does no harm to be defined top level – the let form works as well.
    – A defn will modify state on the clojure world (extends the scope of those function making no longer have nice functional properties).
    – Before the read-field is being called these functions aren’t defined top-level and afterwards they are.
    – To make matters worse, if these functions are defined elsewhere, their definition will be overwritten once read-field is called

  5. Ignoring, for a moment, the tactical mistake of nested defn’s, I think the overall design is really good.

    I like the pattern of writing the code to *look* exactly the way you want it to look and then figuring out macros or anything else you need to make it work correctly.

    This is sort of a customer-centric approach (the customer being another developer) and I find it leads to very concise and elegant solutions that are very maintainable.

  6. Nice work.

    Suggestion: a writer that uses the same specs to encode binary data.

    Are you going to release this on github or submit it for clojure.contrib?

  7. Vincent: for now, I’m making this work only for my little application, but afterwards, I’ll probably look into making it more useful for other purposes. I’ll keep you posted.

  8. I have been staring at your code for a while, and this is probably a basic question, but how does your code consume the buffer? It seems like each call to read-field would start at the beginning of the buffer rather than from the end location of the prior read. Thanks in advance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s