PyPy’s performances impress me

October 19, 2011

PyPy has been a project that I’ve heard about for a few years; a new implementation of the Python language. Because it was a young project and CPython worked for me, I never bothered to look at PyPy very closely. A couple weeks ago, I read a message that claimed that the performance of PyPy for CPU-bounded tasks was impressive compared to CPython’s.

Last week in my algorithms class, we learned about greedy algorithm. The first one we looked at was a program that made change. You give it an amount and a set of coins and it returns the smallest number of coins for amount. This is something we do every day without thinking about it. For fun, I wrote an implementation of the algorithm in Python:

from collections import defaultdict

def make_change1(amount, coins):
    total = 0
    solution = defaultdict(lambda: 0)
    current = 0

    while total != amount and current < len(coins):
        if total + coins[current] > amount:
            current += 1
        else:
            total += coins[current]
            solution[coins[current]] += 1

    return solution

Doing the change operation by successive subtraction is not terribly efficient, so I wrote another version that uses integer division instead:

def make_change2(amount, coins):
    solution = {}
    for coin in coins:
        count = amount / coin
        solution[coin] = count
        amount -= coin*count
    return solution

And then I created a small script to test them with a lot of amounts, using Canadian coins:

import sys

if sys.argv[1] == "1":
    make_change = make_change1
    n = 100000
else:
    make_change = make_change2
    n = 10000000

for i in xrange(n):
    make_change(i, [200, 100, 25, 10, 5, 1])

Here are the times of running this script on my Linux laptop:

$ python2 --version
Python 2.7.2

$ time python2 make_change.py 1
real    0m16.236s
user    0m16.236s
sys     0m0.000s

$ time python2 make_change.py 2
real    0m28.638s
user    0m28.491s
sys     0m0.057s

And now with PyPy:

$ pypy --version
Python 2.7.1 (?, Sep 04 2011, 13:10:48)
[PyPy 1.6.0 with GCC 4.6.1]

$ time pypy make_change.py 1
real    0m2.185s
user    0m2.157s
sys     0m0.023s

$ time pypy make_change.py 2
real    0m8.799s
user    0m8.769s
sys     0m0.023s

Clearly this is just one silly, unscientific micro-benchmark, but I plan on trying to use PyPy more in the future to see if it generally performs better than CPython.


Text Untwist

December 26, 2010

I recently saw a video of Day[9] playing a game called Text Twist. The objective of the game is to find as many words as possible with 6 or 7 given letters. You lose if you cannot find the word (or words) that use all the letters. Being a nerd and a hacker, I decided that I was going to make a little program to play in my stead.

That program is called TextUntwist and is available under the ISC License on GitHub. Usage is very simple:

$ java net.gnuvince.textuntwist.TextUntwist <dict> <letters>

You’ll then have 3 seconds to switch to your game of TextTwist and the words will start being written in. An English and French dictionary are available right now.; they differ from the dictionaries used by Text Twist on Yahoo, so they’ll not find every word. You can add them to the dictionary yourself if you wish. I’ve made a small video for your viewing pleasure.


Cycling with closures

November 28, 2010

There was a discussion on StackOverflow about useful alternative control structures.. One member mentioned that he would like to have an alternate control structure that would cycle through a list of possibilities.

This structure is available already in some languages: Python has it in itertools.cycle and Haskell has the function cycle in its Prelude.

But if you don’t use Python or Haskell, but your language does have closures, you can implement it yourself. Here’s how in JavaScript.

function cycle(choices) {
    var i = 0;
    return function () {
        var current = i;
        i = (i + 1) % choices.length;
        return choices[current];
    };
}

Simple, isn’t it? Here’s a brief explanation of how it works. Inside the body of cycle, there are two local variables, i and choices. They are also available to the anonymous function that’s returned by the function, but the interesting thing is that although i and choices fall out of scope when cycle returns, the anonymous function still has a reference to them and can access and modify them as it needs to. When the anonymous function is called, it returns the current element from choices and moves i to “point” to the next element.

Here are some sample usages:

var colors = cycle(["blue", "white", "red"]); // colors is a function

for (var i = 0; i < 5; ++i) {
    console.log(colors());
}

// Output:
// bleu
// white
// red
// blue
// white

var salutations = cycles([
    function (s) { return "Hello " + s; },
    function (s) { return "Good morning " + s; },
    function (s) { return "Hey " + s; }
]);

for (var i = 0; i < 3; ++i) {
    console.log(salutations()("world"));
}

// Output:
// Hello world
// Good morning world
// Hey world

If you are not familiar with closures and your favorite language supports them, I recommend that you find a tutorial (many are available online using JavaScript) to learn and appreciate their power.


Turing Machine in Scala

September 29, 2010

I am currently enrolled in a theoretical computer science class. Last week, we started learning about Turing machines. The professor wrote and made available a small Turing machine simulator that we can use to write and run simple programs. Unfortunately for the students using Linux or MacOS X, the program is written in C# and runs only on Windows.

I decided to write my own little Turing machine simulator in Scala. The code is short (~40 lines of code) and I felt like sharing it.

First a small introduction to Turing machines (as I understand them):

  • The controller stores the current state that the machine is in. It can hold only one state at a time.
  • The tape is an infinite array onto which we can write characters.
  • The head points to a character on the tape.
  • The head can move one position backward or forward (left or right).
  • When the program is loaded, the initial input is copied at the beginning of the tape. The rest of the tape consists of blanks.
  • The transition function takes the current state and current character under the head and sets a new state, writes a new character on the tape and moves left or right.
  • The programs runs until the state is ACCEPTED or REJECTED.

I may have forgotten some details, but that’s the gist of it. Now let’s see how we implement a Turing Machine.

import scala.collection.mutable.Map

Right off the bat, we’re going to make the Haskell people cry in agony and use a mutable Map in our program. We’ll see later what it’s used for.

object Direction extends Enumeration {
  val left = Value(-1)
  val right = Value(1)
}

We define an enumeration type that represents the two directions where the head can go. We used the values -1 and 1 to make the actual move a simple addition.

class Tape(input: String) {
  private val tape: Array[Char] = Array.fill(65536)('_')
  private var head: Int = 0
  for (i <- 0 until input.length) {
    tape(i) = input(i)
  }

  def getValue() = tape(head)
  def setValue(c: Char) = tape(head) = c
  def move(d: Direction.Value) = head += d.id
}

This is the class that represents the tape. It has a private member called tape which is an array of 65,536 characters initialized to the underscore character (which will be our blank character). It also has a private member called head which emulates the Turing machine’s head. It’s just an integer value that indicates which is the “active cell” on the tape. The constructor of Tape takes a string that is copied onto the tape.

The methods getValue() and setValue() respectively read the current character and write a new value onto the tape. move() changes the position of head. d.id gives us the -1 or 1 value that we defined earlier in Direction.

class TuringMachine(private var input: String, private var state: String) {
  private val tape = new Tape(input)
  private val transition: Map[(String, Char), (String, Char, Direction.Value)] = Map()

  def addRule(tuple: ((String, Char), (String, Char, Direction.Value))) {
    transition(tuple._1) = tuple._2
  }

  def execute() {
    while (state != "ACCEPTED" && state != "REJECTED") {
      val value = tape.getValue
      val (newState, newValue, direction) = transition((state, value))
      println(newState, newValue, direction)
      state = newState
      tape.setValue(newValue)
      tape.move(direction)
    }
    println(state)
  }
}

This is the main class that brings everything together. A TuringMachine’s constructor receives an input to be written on the tape and an initial state. A TuringMachine has a Map (dictionnary) from (String, Char) — a 2-tuple representing the current state and character — to (String, Char, Direction.Value), a 3-tuple that contains the new state, new character and direction for the head.

addRule() is a simple function that adds an entry to the transition map.

execute() loops until the state is either “ACCEPTED” or “REJECTED”. Inside the loop, we take the value on the tape and the state, find the transition rule associated with that pair, and update the state, the tape and the head. We print every intermediary step. When the loop is done, we print the final state.

And that’s it. That’s our Turing machine. Now, let’s write a small program for it:

object TuringMachine {
  def usage() {
    println("Usage: scala TuringMachine  ")
    exit(1)
  }

  def main(args: Array[String]) {
    if (args.length != 2)
      usage()

    val tm = new TuringMachine(args(0), args(1))
    tm.addRule(("even", '0') -> ("even",     '0', Direction.right))
    tm.addRule(("even", '1') -> ("odd",      '1', Direction.right))
    tm.addRule(("even", '_') -> ("ACCEPTED", '_', Direction.right))
    tm.addRule(("odd",  '0') -> ("odd",      '0', Direction.right))
    tm.addRule(("odd",  '1') -> ("even",     '1', Direction.right))
    tm.addRule(("odd",  '_') -> ("REJECTED", '_', Direction.right))

    tm.execute()
  }
}

This is a program that, given an input of ones and zeros, accepts the program if the number of ones is even and rejects the program otherwise. Here’s a sample execution with the string “110101″ and given an initial state of “even” (since 0 ones is even):

$ scala -cp . TuringMachine 110101 even
(odd,1,right)
(even,1,right)
(even,0,right)
(odd,1,right)
(odd,0,right)
(even,1,right)
(ACCEPTED,_,right)
ACCEPTED

Edge cases in computer programs

August 23, 2010

Introduction

We programmers like when problems can be solved with a single, general formula and we generally dislike when we need to manually handle small edge cases. I want to talk about such an edge case that occurred in a little poker program I am writing.

Order please!

My poker program is a hand simulator: given two starting hands, it’ll compute their probabilities of winning at showdown (this is going to be pretty much the same thing as PokerStove). One of the thing I need to do in my program is determine what kind of hand a player has. There are 9 different types of hands in poker. Here they are in decreasing strength order:

  • Straight flush
  • Four of a kind
  • Full house
  • Flush
  • Straight
  • Three of a kind
  • Two pairs
  • One pair
  • High card

(Note: a royal flush is a kind of straight flush.)

Before a hand is properly classified, the cards are sorted to make the classification functions easier to code. The cards are first ordered by the number of cards of the same rank that appear in the hand and in decreasing order of strength. Here are some examples:

  • 5-5-5-T-T: The three fives come first because there’s three of them, then the tens
  • 9-9-3-3-6: We have two pairs, so we put the highest one first, the second pair next and the lonely card last
  • A-8-7-4-3: There are no two cards of the same rank here, so the cards are simply ordered by rank

The hand is then tested for all types of hands from strongest to weakest and the classification stops as soon as a match has been found. For example, to test for four of a kind we verify that card[0] = card[1] = card[2] = card[3] and later we can test for two pairs with card[0] = card[1] and card[2] = card[3] without having to test that card[2] and card[3] are different from card[0] and card[1].

This ordering also makes it easy to determine the winner when two hands have the same type: we just go from the first card to the last, testing which one is the higher card. If after the fifth card no card was higher, we have a tie. This works for all types of hands. Almost.

Why did they invent the wheel?

In poker terms, a wheel is a 5-high straight; the ace is considered both high and low, so A-2-3-4-5 is actually a straight. And this hand, which occurs less than 1% of the time, is the one that requires the most attention.

First, it makes testing for a straight harder. The function for testing a straight is easy with our ordering: starting at the second card, we test that the current card is exactly one rank lower than the card before. This work from A-K-Q-J-T down to 6-5-4-3-2. With a wheel however, we have A-5-4-3-2 and 5 is not one rank lower than an ace and a straight isn’t found. If no straight was found, we need to manually check if the hand is a wheel.

Second, if we used a simple card-by-card analysis to determine which of two straights (or straight flushes) is the higher one, the wheel would actually lose only to broadway (A-K-Q-J-T); it would (incorrectly) win against a straight such as 7-6-5-4-3, because the ace is higher than the 7, even though a 7-high straight beats a 5-high straight. To fix this problem, we must add extra code to our comparison function when we are dealing with straights and straight flushes:

  • If the two hands are wheel, we have a tie
  • If the first hand is a wheel, then the second hand is the winner
  • If the second hand is a wheel, then the first hand is the winner

If none of these tests pass, we can use our card-by-card analysis. (Hands that aren’t straights or straight flushes just skip this part, of course.)

The net result of the wheel is that my functions to test for straights and for comparing hands are twice as long as they would be if wheels were not considered straights. The little less-than-1% case causes the code of two functions to increase by 100%.


My quick review of Winning Low Limit Hold ‘em, by Lee Jones

October 18, 2009

When I started getting interested in poker a few weeks ago, I looked for a novice book that would teach me how to correctly play a hand. I read book reviews on Amazon and on poker sites, and one beginner’s book consistently received high praises, Lee Jones’ Winning Low Limit Hold ‘Em (WLLH). I placed an order on Amazon, received it a few days later and immediately started reading it with the firm intention of thoroughly studying its content before moving on to a no limit book.

I finished reading the book a few days ago and I’m already reading it a second time! The style of Lee Jones is very easygoing, it almost feels as if he’s sitting with us, telling us how to play. The subject matter is presented in a very straight forward manner, building upon the things that were introduced in previous chapters. I liked the footnotes where he says that a piece of advice differs from the earlier editions of the book; I like a person who can form new opinions and not be afraid to admit it.

The book is split in four sections: the basics of hold ‘em, playing a hand of hold ‘em from pre-flop to showdown, a discussion of sit and go tournaments and miscellaneous subjects. By far, the most important sections are the first two.

The first section introduces the different factors that must be considered when it’s time to act (cards, position, number of opponents, money to invest, etc.), calculating outs, pot odds and implied odds. I wish the math for outs and odds was more detailed, but for the beginners and the people who aren’t math-savvy, this short overview works pretty well, and I know that there are books where the math is more rigorous.

The second section, which I will need to be read many times, discusses how to play every stage of a hand. This section starts with a chapters on playing pre-flop in early, middle and late position as well as in the blinds. There are detailed explanations on which action to take when you are the first in the pot, when there are some limpers in front of you, when there is a raise in front of you, etc. Each chapter ends with a small chart that summarizes the content of the chapter. Those pages should be bookmarked. There are a few chapters on post-flop play that discuss what the player should do when he flops two pairs, a set, a straight, when the hand misses, when there’s a flush draw, etc. The important subject of position follows and this section finishes with chapters on how to play the turn and the river.

I quickly skimmed over the sit and go section, since that’s no limit hold ‘em and I really want to solidify my limit game before moving on. The last section contains chapters which did not fit elsewhere: there’s a chapter on card room etiquette, a chapter on discipline, on folding, on bankroll management, Barry Tanenbaum’s ten common errors list, etc. This is a nice heap of good advice.

I should mention one problem I have with this book: my copy is falling apart! I don’t know if it’s cheaper binding or that I was too rough, but as soon as I sat down with it, I started writing notes in the margins, underlining important passages, marking pages with sticky bookmarks and putting post-it notes over notions that I wanted to review in more detail later. The result is that some page are already starting to come off. I guess I’ll need to buy a second copy the next time I order from Amazon.

Despite the construction issue, WLLH is a terrific beginner’s book and I don’t think that I could have asked for anything better. I’ll see you at the tables soon!


Weechat

October 12, 2009

I’ve been an irssi user for more than 7 years. Recently though, I started noticing people on IRC using weechat, another command-line-based client. I got a little curious this weekend and decided to give it a go and I must say that I am extremely impressed with it. I’ll be using it full time in the upcoming weeks to see how I like it. Here are some of the features I like after a couple days of usage:

  • Scriptable in Python: Perl, Ruby, Lua and Tcl can also be used, but since Python is one of my favourite languages, that makes scripting for weechat all the more attractive to me
  • The nicknames of the people speaking are all neatly aligned and separated from their message at the same column, making it very easy to read.
  • Like in graphical clients, weechat maintains a list of the nicknames on each channel
  • Highly configurable: I haven’t found many default settings that were to my dislike that I couldn’t change easily.

There are also things I disliked:

  • Windows don’t maintain their position: if you scroll up in one window, switch to another one and come back, the original window will be at the bottom of its buffer again, not where you left it. There’s a ticket opened for that problem, hopefully it is fixed soon.
  • When you type a long message in irssi, when you reach the right edge of the terminal, the text is move to the left by about a third of the width of the terminal. weechat just keeps on going, always hugging the right edge of the terminal. This is not a showstopper, but it’s a little annoying. Weechat allows the input bar’s height to be configured, which tells me that a nice solution to this problem would be to increase the height as the input reaches the right edge of the window and resets to its original size afterwards.
  • Because the messages are between the aligned nicknames and the nickname list, long URLs are a pain in the neck to open. Some scripts available on the weechat website deal with this problem, but none in a way that I really like. Ideally, I’d want to be able to press a key combination, have a window of the URLs said in this particular channel that would allow me to use the up and down keys to select the one that I wish to open, press Return to open the link. Upon pressing Return, the URL list window would be closed (or kept open, this could be configurable). If I like weechat enough, I just might try to write it myself.

Has anyone ever got their Yahoo! accounts deleted for no apparent reason?

October 1, 2009

I ordered a couple of books on Amazon.ca yesterday and I was a little concerned when I didn’t get a confirmation email. I thought that it could just be a delay in delivery and figured I’d get it the next morning, so I went to bed.

This morning, the confirmation email had still not arrived. The first thing I did was to check if it wasn’t in my spam folder. It wasn’t. My next thought was that the Yahoo! Mail to Gmail redirection had somehow stopped working. I have a Yahoo! account which I used for Amazon, but that I now use for all sorts of throwaway purposes and my main Gmail account; all mail from Yahoo! is redirected to Gmail.

I tried to connect to the Yahoo! Mail web interface, but it gave me a very weird message: this user ID does not exist.

I thought that it was very odd and that it must be a mistake on my part. I tried several times, adding the @yahoo.ca suffix to make sure that I had the correct email, but no luck. Finally, I decided to do a test: I clicked the “New User” link, I typed in my Yahoo! user ID and clicked the “Verify” button to make sure it was available. It was.

My Yahoo! account was deleted sometimes during the week (I cannot say when exactly). Has this ever happened to anyone? Is there a cause for that?


This is why you need static typing

September 29, 2009

Because errors like this are just so embarrassing:

2009-09-29-221401_812x463_scrot


K&R and Programming Safety

September 23, 2009

The C Programming Language by Dennis Ritchie and Brian Kernighan — colloquially known as K&R — is consistently ranked among the top computer programming books. It introduces the C language, its idioms and a part of its library in 190 pages. The last 50 pages contain a reference of the language. Programmers frequently praise K&R for its succinctness and clarity and deplore that the newer programming books are much longer and don’t have clear explanations or real-world examples.

I own a copy of K&R and I agree that it is great; authors of programming books should read it and work to bring the same pith and clarity to their own work. Nevertheless, writing a book on a general purpose programming language in less than 200 pages is not something that authors should aim for. K&R achieves its thinness by avoiding some important topics, not the least of which is safety.

When learning a new skill, such as driving or cooking, it is important to be aware of the possible dangers and how to avoid them. Road safety should not begin after you’ve wrecked your father’s car and it is not after you’ve cut your fingers that you need to learn how to properly handle a knife. Programming is no different: safety needs to be taught from the start.

Regretfully, K&R lacks detailed explanations on how to use the language safely. I cannot recall instances when the authors discussed data validation, how to gracefully and idiomatically handle error conditions or how to avoid the dreaded buffer overflows or format string exploits that plague so many programs today. It might have been okay in the late 70s to just tell the reader how to make a program work and let them deal with safety themselves, but in the 21st century, when so many aspects of our lives are governed by software, it is crucial that safety be constantly on the programmers’ minds. Authors should not leave out the details on how to use a tool safely just to skimp on page; doing so would be highly irresponsible.

I recently read Programming in Scala by Martin Odersky, Lex Spoon and Bill Venners and I want to point to their book as an example of clear and succinct writing while still providing the read with plenty of details and making sure that the best practises of the language are taught.


Follow

Get every new post delivered to your Inbox.