Sunday, March 25, 2012

C99 designated initializers

One of the very nice features added in C99 is the designated initializer. This allows you to write code like the following to initialize a structure:

    div_t d = { .quot=3, .rem=2 };

In C89 there was no way to reliably initialize a structure like this. The specification says that the quot and rem members may be in any order, so if you write:

    div_t d = { 3, 2 };

you can't be sure which member will be 3 and which will be 2.

In general I'm enamored with designated initializers. But I've run into an unfortunate case where the specification is somewhat ambiguous.

Paragraph §6.7.8.19 of the C99 standard (draft version available for free here) has this to say about the ordering of designated initializers:

  1. The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;130 all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
(Footnote 130 reads "Any initializer for the subobject which is overridden and so not used to initialize that subobject might not be evaluated at all.")

The spec says that "initialization shall occur in initializer list order". To me, this suggests that one initializer can safely rely on the result of a previous initializer, e.g.:

    div_t d = { .quot=42, .rem=d.quot };

So the following program ought to only invoke well defined behaviour, right?

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    div_t d1 = { .quot=1, .rem=2, };
    printf("d1: quot=%i, rem=%i\n", d1.quot, d1.rem);

    div_t d2 = { .quot=1, .rem=d2.quot };
    printf("d2: quot=%i, rem=%i\n", d2.quot, d2.rem);

    div_t d3 = { .rem=2, .quot=d3.rem };
    printf("d3: quot=%i, rem=%i\n", d3.quot, d3.rem);

    return 0;
}

Let's compile it and see what happens:
  1. $ gcc -c99 -O3 -Wall -Wextra foo.c
    $ ./a.out
    d1: quot=1, rem=2
    d2: quot=1, rem=1
    d3: quot=2, rem=2

Excellent! That exactly what I would expect to happen. The initializers are run in order, allowing the second designated initializer to depend on the result of the first.

Now here's where things start to get weird:
  1. $ gcc -c99 -O0 -Wall -Wextra foo.c
    $ ./a.out
    d1: quot=1, rem=2
    d2: quot=1, rem=0
    d3: quot=0, rem=2

If we turn off optimization (-O0) the results suddenly change! Even worse, I've compiled with maximum warnings (-Wall -Wextra) and GCC doesn't even issue a warning about using an uninitialized variable!

How can we reconcile this behaviour with the specification? I think that we need to take paragraph §6.7.8.23 into account, as well:

  1. The order in which any side effects occur among the initialization list expressions is unspecified.131

(Footnote 131 reads "In particular, the evaluation order need not be the same as the order of subobject initialization.")

This suggests that evaluation and initialization are two separate steps: an implementation may evaluate all of the initialization expressions (in any order), record the results in temporary storage, and then apply them all in order. If you inspect the generated assembly code this does seem to match what happens in GCC at -O0.

So why have paragraph 19 at all? I don't see how you could write a C program which can observe the initialization order, except for the special case of one designated initializer overriding another. If that is its only purpose the specification could certainly be more explicit about it. (It's also possible that the specification has actually been clarified in this respect; I don't have access to the final version.)

I've tried this test case with a handful of different compilers and get similar results. However I would be curious to hear about results with other C99 compilers.

Update

It's been a while since I updated this blog. I've been a bit busy, but new articles will start appearing shortly. Since my last post, I have left IBM Canada and joined Two Sigma Investments in New York. I'm no longer developing virtual machines, but the name and scope of the blog will remain the same. I'm still doing low-level software development and I'm learning a lot about areas I haven't investigated in depth before, and from my new colleagues.