Sunday, March 25, 2012

C99 designated initializers

One of the very nice features added in C99 is the designated initializer. This allows you to write code like the following to initialize a structure:

    div_t d = { .quot=3, .rem=2 };

In C89 there was no way to reliably initialize a structure like this. The specification says that the quot and rem members may be in any order, so if you write:

    div_t d = { 3, 2 };

you can't be sure which member will be 3 and which will be 2.

In general I'm enamored with designated initializers. But I've run into an unfortunate case where the specification is somewhat ambiguous.

Paragraph §6.7.8.19 of the C99 standard (draft version available for free here) has this to say about the ordering of designated initializers:

  1. The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;130 all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
(Footnote 130 reads "Any initializer for the subobject which is overridden and so not used to initialize that subobject might not be evaluated at all.")

The spec says that "initialization shall occur in initializer list order". To me, this suggests that one initializer can safely rely on the result of a previous initializer, e.g.:

    div_t d = { .quot=42, .rem=d.quot };

So the following program ought to only invoke well defined behaviour, right?

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    div_t d1 = { .quot=1, .rem=2, };
    printf("d1: quot=%i, rem=%i\n", d1.quot, d1.rem);

    div_t d2 = { .quot=1, .rem=d2.quot };
    printf("d2: quot=%i, rem=%i\n", d2.quot, d2.rem);

    div_t d3 = { .rem=2, .quot=d3.rem };
    printf("d3: quot=%i, rem=%i\n", d3.quot, d3.rem);

    return 0;
}

Let's compile it and see what happens:
  1. $ gcc -c99 -O3 -Wall -Wextra foo.c
    $ ./a.out
    d1: quot=1, rem=2
    d2: quot=1, rem=1
    d3: quot=2, rem=2

Excellent! That exactly what I would expect to happen. The initializers are run in order, allowing the second designated initializer to depend on the result of the first.

Now here's where things start to get weird:
  1. $ gcc -c99 -O0 -Wall -Wextra foo.c
    $ ./a.out
    d1: quot=1, rem=2
    d2: quot=1, rem=0
    d3: quot=0, rem=2

If we turn off optimization (-O0) the results suddenly change! Even worse, I've compiled with maximum warnings (-Wall -Wextra) and GCC doesn't even issue a warning about using an uninitialized variable!

How can we reconcile this behaviour with the specification? I think that we need to take paragraph §6.7.8.23 into account, as well:

  1. The order in which any side effects occur among the initialization list expressions is unspecified.131

(Footnote 131 reads "In particular, the evaluation order need not be the same as the order of subobject initialization.")

This suggests that evaluation and initialization are two separate steps: an implementation may evaluate all of the initialization expressions (in any order), record the results in temporary storage, and then apply them all in order. If you inspect the generated assembly code this does seem to match what happens in GCC at -O0.

So why have paragraph 19 at all? I don't see how you could write a C program which can observe the initialization order, except for the special case of one designated initializer overriding another. If that is its only purpose the specification could certainly be more explicit about it. (It's also possible that the specification has actually been clarified in this respect; I don't have access to the final version.)

I've tried this test case with a handful of different compilers and get similar results. However I would be curious to hear about results with other C99 compilers.

4 comments:

  1. Question:

    int i = i + 1; // is this valid?

    If the above declaration + initialization is not valid, then how come GCC and other C compilers would accept referencing a variable that's not even in scope yet (the LHS variable of a declaration-with-initializer isn't in the scope in the RHS expression, right?)

    I'm puzzled...

    - Raven

    ReplyDelete
  2. Hi Raven,

    That example is valid C code, in that it will compile, but it invokes undefined behaviour because i is read before it is initialized. The scope of i starts as soon as it is declared, so it is in scope.

    Why is this allowed? First of all, C is pretty permissive. It gives you a lot of rope, and if you get tangled up and end up hanging yourself, it's your own problem. But there are cases where you might want to write code very much like this which would be valid.

    Here are some examples:

    int i = sizeof(i);
    void* addr = &addr;

    Because these don't read from the uninitialized value they're both valid and well-defined statements.

    ReplyDelete
  3. One way to try to observe the initialization order is by using a function call in the designated initializers:

    static int i;

    int init(const char * var)
    {
    printf("%s initialized\n", var);
    return ++i;
    }

    void foo(void)
    {
    struct {
    int x;
    int y;
    } s = {
    .y = init("y"),
    .x = init("x1"),
    .x = init("x2"),
    };
    }

    Unfortunately, gcc warns that "initialized field with side-effects overwritten" and only one call to init() will be made for x. Additionally gcc will call the init functions in the order x and then y. This fulfills the spec, but is not necessarily the expected behaviour based on the order of the code.

    ReplyDelete
    Replies
    1. That lets you observe the evaluation order, not the initialization order.

      Delete