The compiler should help with reliability

A few days ago, I learned about a C-like language with a strong emphasis on safety called Cyclone. As I read the documentation, I found the language very interesting: low-level like C with features inspired by ML and Haskell to improve reliability.

I talked about the language on the French programming channel I frequent. One member had used it and said that the language was good and really helped with common mistakes in C. He also mentioned some problems, such as the lack of threads and weak GDB support. Another member wondered about Cyclone: why invent a new language when you can just use lint to find potential problems. I knew of lint, but I’d never used it. Cyclone claimed that it prevented dangling pointers by including the region in which they exist in the type. I wrote a C program to cause a dangling pointer error, and I was surprised to see lint catch it.

However, I don’t believe that lint is as good as having a language with a strong typing system. Lint can only emit warnings, and the programmer is free to ignore them. On the other hand, if the type system finds an error a refuses to build the program, the programmer has no choice but to address it. Here’s something that lint cannot catch:


#include <stdio.h>

void interleave(void **aggregated,
                void **array1,
                size_t size1,
                void **array2,
                size_t size2) {
    while (size1 && size2) {
        *aggregated++ = *array1++;
        *aggregated++ = *array2++;
        size1--;
        size2--;
    }
    while (size1) {
        *aggregated++ = *array1++;
        size1--;
    }
    while (size2) {
        *aggregated++ = *array2++;
        size2--;
    }
}

int main(void) {
    int xs[4] = {1, 2, 3, 4};
    double ys[4] = {1, 2, 3, 4};
    int zs[8];
    size_t i;

    interleave((void **)zs, (void **)xs, 4, (void **)ys, 4);
    for (i = 0; i < 8; ++i) {
        printf("%d ", zs[i]);
    }
    printf("\n");

    return 0;
}

The interleave function intersperses the first array with the elements of the second one. On line 26, I declare ys to be an array of 4 doubles and in the call to interleave on line 30, I try to interleave those values into an array of ints. This is going to fail of course, but gcc remains silent:

$ gcc -Wall -pedantic interleave.c
$

And of course, when we execute the code, it fails:

$ ./a.out
1 0 2 1072693248 3 0 4 1073741824
$

Does lint catch the error? Nope. It does mention something about zs not being completely defined, but no big “hey you got a type mismatch!”

$ splint interleave.c
Splint 3.1.1 --- 03 Nov 2006

interleave.c: (in function interleave)
[... warnings about using size_t as a boolean... ]

interleave.c:30:16: Passed storage zs not completely defined (*zs is
                       undefined): interleave ((void **)zs, ...)
  Storage derivable from a parameter, return value or global is not defined.
  Use /*@out@*/ to denote passed or returned storage which need not be defined.
  (Use -compdef to inhibit warning)

[... warning about interleave not being static... ]

Finished checking --- 5 code warnings

Since neither the compiler nor lint catch the error, it is the programmer’s responsibility to catch it and fix it. In the current example, with less than 50 lines of code, it’s pretty easy to see the problem. In a larger project however, this could slip by and strike at the worst possible time (thank you Mr. Moore!)

With a language with a stronger typing system, this error could easily be avoided. (The example is in Haskell because, despite my best efforts, it seems I don’t know/understand Cyclone well enough to replicate this simple function. Could anyone help me out?)

interleave :: [a] -> [a] -> [a]
interleave [] ys     = ys
interleave xs []     = xs
interleave (x:xs) (y:ys) = x : y : interleave xs ys

Quick explanation: the interleave function takes two arguments1, both lists of a’s (denoted by the brackets) and returns a list of a’s. a is a type variable: it means that the parameters can be lists of integers or floats or other lists, etc. The two input lists must be of the same type (because they use the same type variable) and Haskell will raise an error and refuse to compile your code if the types don’t match. This means that if we had a list of Ints and a list of Doubles and we tried to interleave them together like we did in the C example, the compiler will tell us.

Software reliability is very important, and all humans are fallible, regardless of what some coders believe. It’s therefore important that the tools we use help us as much as possible to ensure that our programs are going to behave properly.

1 Not technically true, but good enough to understand the example.

8 Responses to “The compiler should help with reliability”

  1. Greg Morrisett Says:

    I’ve included a “port” to Cyclone of the C code given above, that’s as
    faithful as I can make it. I’ve rewritten interleave to be polymorphic
    (notice the use of the type variable `a in the prototype, restricted to
    types of boxed kind), and I’ve used fat pointers (denoted with ? instead
    of *). As expected, Cyclone gives a type error that it expects the
    array2 argument to match the array1 and aggregated argument types
    and thus rejects the code.

    #include

    void interleave(`a::B ?aggregated,
    `a ?array1,
    size_t size1,
    `a ?array2,
    size_t size2) {
    while (size1 && size2) {
    *aggregated++ = *array1++;
    *aggregated++ = *array2++;
    size1–;
    size2–;
    }
    while (size1) {
    *aggregated++ = *array1++;
    size1–;
    }
    while (size2) {
    *aggregated++ = *array2++;
    size2–;
    }
    }

    int main(void) {
    int xs[4] = {1, 2, 3, 4};
    double ys[4] = {1, 2, 3, 4};
    int zs[8];
    size_t i;

    interleave((int ?)zs, (int ?)xs, 4, (double ?)ys, 4);
    for (i = 0; i < 8; ++i) {
    printf(”%d “, zs[i]);
    }
    printf(”\n”);

    return 0;
    }

  2. Greg Morrisett Says:

    Ah, and here is the output of the compile:

    bash-3.00$ cyclone -c foo.cyc

    foo.cyc:26: ys was declared with type double [4U] but initialized with type int [4U]
    double and int are not compatible. (different type constructors)
    foo.cyc:30: actual argument has type double ?`main but formal has type int ?`main
    double and int are not compatible. (different type constructors)

    COMPILATION FAILED!
    bash-3.00$

  3. someone Says:

    You get what you deserve.
    If you didn’t do the (void **) cast, you’d get warning left and right from everything. And a pointer cast tells the compiler “trust me, I know what I’m doing”. Cyclone wouldn’t be useful as a systems programming language if it didn’t have a “trust me” mechanism, and if you tell the compiler to trust you, what do you expect?

    I suspect the problem is inherent — you need “trust me” mode for systems programming. You can push it down the abstraction chain, but it has to come in somewhere.

  4. Allen Short Says:

    Yes, you do need trust-me mode somewhere. The main weakness of C is that you have to reach for it so quickly. More sophisticated tools will let you say what you mean more often and have the compiler or the runtime check it for you. The need to go under those abstractions won’t ever vanish, but we can make it less often necessary.

  5. Greg Morrisett Says:

    Er, I’m not sure why you think Cyclone needs a “trust me” here. As the port shows, the code does not type-check (and it shouldn’t). It’s C that needs the “trust me” because it lacks polymorphism (amongst other things).

  6. Steven Pigeon Says:

    You cannot possibly hope the compiler will catch these errors. For one thing, explicit casting to void** direct the compiler to ignore potential type conflicts and to discard type and size information it may have about the data. The fact that mayhem ensues in this case is merely the result of a really, really, really bad coding decision.

    The (more) correct way of implementing interleave in C would be something like

    
    void interleave(
     char * dest,
     const char * source1, size_t type1_size,
     const char * source2, size_t type2_size,
     size_t count )
     {
       // here deal with moving count*2 memory
       // blocks, ... using memcpy or memmove if
       // necessary (in cases of overlaps)
       //
       for (size_t c=0;c<count;c++)
        {
         memcpy(dest,source1,type1_size); dest+=type1_size;
         memcpy(dest,source2,type2_size); dest+=type2_size;
         source1+=type1_size;
         source2+=type2_size;
         // or something similar, anyway
       }
    }
    

    and you call it like this:

    
    interleave( dest,
                  source1, sizeof(type1),
                  source2, sizeof(type2),
                  min(count1,count2) );
    

    Note that I don’t use void* but char*. Why so? char * points to char, which, unlike void, has a sizeof, which allows you to write dest+=type1_size, thus entirely avoiding extra casting. You still have the danger of writing beyond the dest buffer if you did not allocate enough (that is, (sizeof(type1)+sizeof(type2))*min(count1,count2) bytes).

    The fact that C doesn’t know much about arrays (therefore failing to catch out-of-bounds accesses) is not as much a flaw as it was a design decision. You may agree with it or not, but that’s another story.

  7. Debugger Says:

    What about using C++? Templates can make the code far more typesafe and reusable:

    Example:

    
    #include <iostream>
    #include <vector>
    #include <list>
    #include <iterator>
    
    template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
    void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
            FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
            FwdOutItr dest)
    {
        while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
        {
            *dest++ = *seq_beg_1++;
            *dest++ = *seq_beg_2++;
        }
    
        while(seq_beg_1 != seq_end_1)
            *dest++ = *seq_beg_1++;
    
        while(seq_beg_2 != seq_end_2)
            *dest++ = *seq_beg_2++;
    }
    
    int main()
    {
        int xs[4] = {1, 2, 3, 4};
    
        std::list<int> ys;
        ys.push_back(5);
        ys.push_back(6);
        ys.push_back(7);
        ys.push_back(9);
    
        std::vector<int> zs;
    
        interleave(xs, xs + 4, ys.begin(), ys.end(), std::back_inserter(zs));
    
        std::vector<int>::const_iterator itr = zs.begin();
        for(; itr != zs.end(); ++itr)
            std::cout << *itr << " ";
    
        std::cout << std::endl;
    }
    
  8. C++’s typesafety to the rescue? « Debugger’s Weblog Says:

    [...] Today, i was reading a couple of blogs when i came across an interesting article from gnuvince about the capacity/incapacity of the compilers to reliably detect, catch and warn the programmer of [...]

Leave a Reply