The compiler should help with reliability

August 19, 2008

A few days ago, I learned about a C-like language with a strong emphasis on safety called Cyclone. As I read the documentation, I found the language very interesting: low-level like C with features inspired by ML and Haskell to improve reliability.

I talked about the language on the French programming channel I frequent. One member had used it and said that the language was good and really helped with common mistakes in C. He also mentioned some problems, such as the lack of threads and weak GDB support. Another member wondered about Cyclone: why invent a new language when you can just use lint to find potential problems. I knew of lint, but I’d never used it. Cyclone claimed that it prevented dangling pointers by including the region in which they exist in the type. I wrote a C program to cause a dangling pointer error, and I was surprised to see lint catch it.

However, I don’t believe that lint is as good as having a language with a strong typing system. Lint can only emit warnings, and the programmer is free to ignore them. On the other hand, if the type system finds an error a refuses to build the program, the programmer has no choice but to address it. Here’s something that lint cannot catch:

#include <stdio.h>

void interleave(void **aggregated,
                void **array1,
                size_t size1,
                void **array2,
                size_t size2) {
    while (size1 && size2) {
        *aggregated++ = *array1++;
        *aggregated++ = *array2++;
        size1--;
        size2--;
    }
    while (size1) {
        *aggregated++ = *array1++;
        size1--;
    }
    while (size2) {
        *aggregated++ = *array2++;
        size2--;
    }
}

int main(void) {
    int xs[4] = {1, 2, 3, 4};
    double ys[4] = {1, 2, 3, 4};
    int zs[8];
    size_t i;

    interleave((void **)zs, (void **)xs, 4, (void **)ys, 4);
    for (i = 0; i < 8; ++i) {
        printf("%d ", zs[i]);
    }
    printf("\n");

    return 0;
}

The interleave function intersperses the first array with the elements of the second one. On line 26, I declare ys to be an array of 4 doubles and in the call to interleave on line 30, I try to interleave those values into an array of ints. This is going to fail of course, but gcc remains silent:

$ gcc -Wall -pedantic interleave.c
$

And of course, when we execute the code, it fails:

$ ./a.out
1 0 2 1072693248 3 0 4 1073741824
$

Does lint catch the error? Nope. It does mention something about zs not being completely defined, but no big “hey you got a type mismatch!”

$ splint interleave.c
Splint 3.1.1 --- 03 Nov 2006

interleave.c: (in function interleave)
[... warnings about using size_t as a boolean... ]

interleave.c:30:16: Passed storage zs not completely defined (*zs is
                       undefined): interleave ((void **)zs, ...)
  Storage derivable from a parameter, return value or global is not defined.
  Use /*@out@*/ to denote passed or returned storage which need not be defined.
  (Use -compdef to inhibit warning)

[... warning about interleave not being static... ]

Finished checking --- 5 code warnings

Since neither the compiler nor lint catch the error, it is the programmer’s responsibility to catch it and fix it. In the current example, with less than 50 lines of code, it’s pretty easy to see the problem. In a larger project however, this could slip by and strike at the worst possible time (thank you Mr. Moore!)

With a language with a stronger typing system, this error could easily be avoided. (The example is in Haskell because, despite my best efforts, it seems I don’t know/understand Cyclone well enough to replicate this simple function. Could anyone help me out?)

interleave :: [a] -> [a] -> [a]
interleave [] ys     = ys
interleave xs []     = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys

Quick explanation: the interleave function takes two arguments1, both lists of a’s (denoted by the brackets) and returns a list of a’s. a is a type variable: it means that the parameters can be lists of integers or floats or other lists, etc. The two input lists must be of the same type (because they use the same type variable) and Haskell will raise an error and refuse to compile your code if the types don’t match. This means that if we had a list of Ints and a list of Doubles and we tried to interleave them together like we did in the C example, the compiler will tell us.

Software reliability is very important, and all humans are fallible, regardless of what some coders believe. It’s therefore important that the tools we use help us as much as possible to ensure that our programs are going to behave properly.

1 Not technically true, but good enough to understand the example.


Why I recommend Scheme

August 11, 2008

I like to hang around on a French-speaking, programming IRC channel. I’m known there as “the language guy” because of my hobby of trying all sorts of different and weird languages. I’m also known as being the guy that touts the merits of functional programming and Haskell from time to time.

Some of the guys over there have asked me which language I recommend they use to learn functional programming. Despite my liking Haskell very much, my answer is always the same: Scheme.

Haskell is a very nice language, it has a plethora of desirable features and it will positively influence other languages. However, Haskell is also a very complex language, there are many things in the language that stump people who want to pick it up (I had to try four separate times before I finally “clicked” with Haskell!) I do not find that it is suitable for someone who just wants to see what functional programming is about.

Let’s look at both languages side by side:

  Haskell Scheme
Typing Static typing. Though I am now convinced that a solid static typing system such as Haskell’s is a definite advantage when building software, it introduces a lot of complexity too. Inference, type classes, algebraic data types, existential types, phantom types, etc. That’s a lot of scary words and complexity for a newcomer, and static typing has nothing to do with learning the functional paradigm. Dynamic typing. It doesn’t catch type mismatch problems before accepting to compile your program, but it’s very simple to use and has a very low mental requirement. Ideal for learning about something other than type systems. The error messages given at run-time by Scheme on type errors are simpler than Haskell’s often cryptic compile-time error messages.
Syntax Haskell has a very beautiful syntax, but it’s a more complex one than Scheme’s. You need to be wary of operator precedence, layout rules (Haskell can be indentation-dependant) and other syntactic sugars which make Haskell nice to use when you know them, but can be a stumbling block when learning the language. Scheme has one of the simplest syntax of any programming language. The rules can be explained in minutes. The syntax is unlike what most people are used to, especially if they come from a C-syntax background, but it’s easy to pick up and use.
Laziness Haskell has lazy evaluation. Lazy evaluation has its benefits, which I will not discuss here, but it is harder to understand when something will happen. Scheme has eager evaluation, which is what most programmers are used to and is easier to reason about.
Environment The REPL of Haskell does not expose the entire language, it has limitations: for example, you can’t define functions or types as you would in your editor. So it can sometimes be little painful to test things interactively in GHCi. DrScheme is an easy, friendly and free environment for Scheme that is ideal for learning. You can type anything you like in the REPL and Scheme will happily execute it. It also features an editing pane to write code and at the press of a button, the code in the pane is compiled. DrScheme also offers nice debugging and profiling tools.
Literature Before “Real World Haskell”, there were not a lot of free, easily accessible books for Haskell on the Internet. Some of the newbie-oriented tutorials were actually quite daunting, making it hard to suggest reading material to somebody who wanted to learn functional programming. Scheme has more books freely available on the Internet: “How To Design Programs” for people new to programming, Scheme and/or functional programming or the grand classic “Structure and Interpretation of Computer Programs” for the more serious and advanced students. “Teach Yourself Scheme in Fixnum Days” is somewhere in between. All books are of great quality and great value.