ziglang/zig

Proposal: Support nested anonymous struct and unions.

Closed this issue · 51 comments

Details of Exactly What was once Accepted
Details of why we went back on that and rejected this issue


Proposal: Support nested anonymous struct and unions

Section Value
Author: Byron Heads
Implementation:
Status: Draft

Abstract

A useful and common pattern found in a few other languages is the ability for structs and unions to contain anonymous structs and unions.

Example: Views into packed unions

const ARGBColor = packed union {
   bytes: u32,
   packed struct {
      a: u8,
      r: u8,
      g: u8,
      b: u8,
   };

  var color = ARGBColor{ .a = 0, .r = 255, .g = 0, .b = 255};
  color.r = 128;

  glFunction(color.bytes);
}
const VirtualRegister32 = packed union {
  F32: u32,
  packed struct {
      H16: packed union {
        F16 : u16
        packed struct {
          H8: u8,
          L8: u8
       },
       L16 = packed union {
         F16 : u16
         packed struct {
           H8: u8,
           L8: u8
      }
  }
};

car r1 = VirtualRegister32{.F32 = 0};

r.L16.L8 = 9;

Example: Struct OOP

struct base { 
  int type_id; 
  struct base *next;
}; 

 struct derived { 
  struct base; /* anonymous */ 
int a_new_field; ... }; 

Pros

  • Removes the need to name nested struts where the name is not meaningful
  • Can reduce the need to cast on structs and union types (especially when working with C)
  • Reduces the need to property functions

Cons

  • Could add confusion at the call site
  • Can be solved by casting and property functions
  • Can by solved by naming the nested struct

How would you access that struct though without adding confusion to the callsite? The following accomplishes what you are looking for and more?

const ARGBColor = packed union {
    bytes: u32,
    argb: packed struct {
      a: u8,
      r: u8,
      g: u8,
      b: u8,
    },
};

In C the parent struct/union basically adopts the members of the anonymous child struct/union.
I also find it usefull:

https://stackoverflow.com/questions/8932707/what-are-anonymous-structs-and-unions-useful-for-in-c11

Yes, but my point was that it adds obfuscation to the call site (clarified original comment, a little); when you call .a is it a union to bytes and to .r, .g, .b or is it part of the structure that is a union to bytes. I just don't see how adding it helps anything? Having to clarify that you are talking about the struct argb doesn't add any noise and adds clarification, also since Zig allows methods in structs it complicates things :).

Like the following call; foo.bar.a is clear that a is a property of bar which is a union member in foo but foo.a removes that 'clearness' and isn't much more succinct; Maybe I'm just someone who likes it to be more explicit, but I don't get the gain here, for the loss of clarification?

The idea is c.argb.a doesn't add any new information. In this case I care about the elements of the color, and need to pass the entire color to say gl.

I don't think this code is hard to understand or that naming the struct makes a difference.

var c = Color{
  .a = 0, 
  .r = 43, 
  .g = 213,
  .b = 0
};

c.g = 99;

glFunc(c.bytes);

Vs 

c.argb.g=86;

Can someone give an example that is not solved by Zig's other features?
Having a bytes representation can be done without unions:

const Color = packed struct { r: u8, g: u8, b: u8 };
var c: Color = undefined;
const bytes = ([]u8)((&c)[0..]);

Ofc, you would probably have toBytes and asBytes functions.

As for another example from the stack overflow linked earlier:

typedef struct {
    bool is_float;
    union {
       float f;
       char* s;
    };
} mychoice_t;

Here, you should really use tagged unions in Zig:

const MyChoice = union(enum) {
    Float: f32,
    String: []const u8,
};

Before we can consider nested anonymous struct/unions we need to know the exact use-case. Arguing over example code that is solved by Zig's other features is not very productive. Link some Zig code that could be improved by this feature, or links some C code that is not easily mapped to Zig because of nested struct/unions. We want real world examples not "made up on the spot to get a feature into a language" examples.

Recently I used it for color and vector representations. Zig solves tagged unions nicely but this is for packed structs. I understand this is a qol enhancement, but casting, bit shifting and functions to do something trivial like accessing the bytes in the color value seems like a bad solution.

Here is a better example (typing code on a phone sucks). Here the two example color formatea both work with the foo function without having to know what order the format is in.

const ARGB = union {
  bytes : u32, 
  struct {
    a: u8, r:u8, g:u8, b:u8
  }
};

const RGBA = union {
  bytes: u32,
  struct {
    r: u8, g:u8, b: u8, a:u8
  }
}

fn foo(c: var) void {
  c.r *=2;
  glFoobar(c.bytes);
}

And yes the component struct could have a common name like bits or something but I don't find this ads more information just more typing since in most cases your manipulating the components and passing an array of them off to be rendered.

There are two glaring issues with the example code you gave; the first one being that the order is different therefore the value of bytes will be different, yet you are passing it into the same function which presumably wants them in the same order? I just can't think of a function that wouldn't care about the order of those 4 components, without you telling it the specific order??

That is "both work with the foo function without having to know what order the format is in." Makes no sense to me, its a colour; it matters what order they are in if you are setting a standard and I don't see how glfoobar could use an invariant standard without having to pass in a type which formulates to standard.

Regardless if we look pass this and pretend lets say that the order doesn't matter you could also just write;

const ARGB = packed struct {
    a: u8, r:u8, g:u8, b:u8
    pub fn bytes(self: &this) u32 {
        return @ptrCast(&u8, self)[0..4];
    }
};

const RGBA = packed struct {
    r: u8, g:u8, b: u8, a:u8
    pub fn bytes(self: &this) u32 {
        return @ptrCast(&u8, self)[0..4];
    }
}

fn foo(c: var) void {
  c.r *=2;
  glFoobar(c.bytes());
}

Note: I just did a translation, there are many things you could do to improve this its a rough idea though.

Keeping in mind that I'm presuming that your code is correct! Maybe double check glFoobar, because from my knowledge as I've said I can't think of one that was order invariant :).

Yes foo doesn't care about the byte order in this example, all it cared about was doubling the red color and passing it to a fake openGl function, that function would care about the order but that is not foos concern (foo assumes you passed the correct struct depending on the format of the openGl device. This is the beauty of generic code, ie why implement foo for each possible color format (that could include 24 bit colors too!). This is also assuming lots of interactions with C functions.

Having to casting seems wrong and the function unnecessary in that case. Does the compiler ensure its inlined? if not then you wont use the function in critical paths of code, and will end up always casting, and code with lots of casting == bugs.

Without this feature I would just name the nested struct, the union is to useful but is adding more to be remembered that doesn't add anything.

here is another version of the color object(note: I am not familiar with aligning in zig yet so might be wrong).

const ColorARGB= packed union align(4) {
  bytes : u32, 
  packed struct {
    a: u8,
    packed union align(1) {
       pure: u24,
       packed struct {
          r:u8,
          g:u8,
          b:u8
      }
    }
  },
  array: [4]u8,  
};

Vectors are also common:

const Vector = packed union align(16) {
  array: [4]f32,
  struct: {   // this can be named, but is adding more to remember
    x: f32,
    y: f32,
    z: f32,
    w: f32,
  },
  simd: f128,
}

I suggest you run some C code that does what you want; you'll find that the bytes value is NOT the same (you may have to mark the structs as volatile else C may optimise the order, though I sincerely doubt it). This is talking about the ARGB and RGBA example of course.

Your function doesn't 'care about order' as all it does is access r but as I said glFoo is the one that cares about it not foo. glFoo must care about the order of the bytes property else it must perform some superfluous operation, since you get given an array that is either [A, R, G, B] or [R, G, B, A] literally none of the members are in the same order. :). So by extension your function does care about order, but as I've said before the source of that is glFoo :).

Now onto some of your comments and your examples; I'll split it up so I can clarify a few things.

Having to casting seems wrong and the function unnecessary in that case.

Maybe have a look at what casting actually is in code to understand; raw data has no type, so when you cast typically you are indicating to the compiler what type you want it to pretend it is. So even if you use a union you are still casting; it is just up to whether or not you see the cast in your code or if it is behind the scenes. Unnecessary enough to add an entire new construct? I would say not, I would say it is even extra necessary to indicate what is actually going on.

Does the compiler ensure its inlined?

Different issue, not dependent on this; and yes it would/should.

if not then you won't use the function in critical paths of code, and will end up always casting, and code with lots of casting == bugs.

Why wouldn't you? I think your overvaluing the benefit of inlining. A function call is pretty minimal as all things go. Don't understand how this amounts to lots of casting?? You're performing a single cast per 'bytes' call, the same as if you had a flattened nested struct. I'll cover the fact that casting is often not actually carried out in assembly/machine code a little further below :).

As with your other examples I don't see any real benefit towards having them be non-nested? For example let's break down the vector one;

  • The simd bit follows same casting things as before. Though you'll have to align cast the Vector ptr to widen it to 4 than ptr cast it across to a f128, similar if you wanted to get a u32 from the colour.
    • i.e. do const simd = @ptrCast(&f128, @alignCast(16, &self));
  • And to get it as an array you can just do ptrCast(&f32, &Vector)[0..4] which would give you a slice array of the contents, again I don't see any difference.

Of course in some cases casting will add more instructions for example when casting to a higher promoted type such as f128, and definately when you cast across bounds like floating to integer (though this isn't relevant when talking about ptr casting). So in some cases it'll add an instruction or two, but in reality those instructions would exist even if you just used a flattened nested struct.

Basically this issue seems to be at a standstill; to convince me (not speaking for everyone of course, but I would presume @Hejsil based on his comments), I would have to see there be a significant difference in some areas;

  • ease of calling, having a .simd() vs .simd is purely subjective, I personally prefer the function since you could even block it from editing it; and adding a whole new feature for a subjective choice like this is not a good enough reason.
  • a difference in the instructions generated
  • a safer way to do things.
  • A real world example!

Maybe instead we should add a small std module that helps you perform these casts safer such as std.cast.widenStructAsSingle(Vector) (super bad name) which would give you the f128 simd result, of course you would wrap this up in a nice function; but this would be error free as we would provide some nice compile time checks. Or maybe have a std.cast.toArray(Vector) which would return each struct member as an array. These would remove any chance for 'bugs' :), as well as accomplish what you are looking for.

Huh? If C unions dont keep the order, then unions are not safe to use. Maybe your talking about packing, with align all of these unions would work in C as intended. Also C/C++ dont rearrange the order of members. They can add padding between members based on alignment, but thats it. Without that you couldn't pass unions/structs around. So yes without align and packed the stuct of u8 may not align (though I dont know of any modern compiler that wouldnt pack that correctly), but anyone that is doing this kind of data packing optimization would understand this, or their code wouldnt work.

Function calls are expensive in critical path code (16ms is not a lot of time to update 1000s of vectors and color arrays), saving registers to the stack, and you risk cache miss while calling the function. In C/C++ your example functions would all be macros or use a compiler that inlines your code or errors if it cannot.

The purpose of the union is to not cast and not have the data modified at all. In the vector example you want that data to fit into a simd register and you want to avoid calling the unaligned asm, you also want to access each axis without bit shifting and casting all the time.

So currently just naming the components is really the alternative, this ask is a QOL enhancement.

Some examples from github projects:
https://github.com/arkanis/single-header-file-c-libs/blob/master/math_3d.h#L155
https://github.com/scoopr/vectorial/blob/master/include/vectorial/simd4f_sse.h#L30

Fairly certain the functions in question will in fact be inlined. You can enforce this by marking them as inline and get a compile error if for some reason that can't be done.

const warn = @import("std").debug.warn;

pub fn main() void
{
    const ARGB = packed struct {
            a: u8, r:u8, g:u8, b:u8,
    };
    const Color = extern union {
        bytes: u32,
        argb: ARGB,
    };
    
    var x = Color{ .argb = ARGB{.a = 0xDE, .r = 0xC0, .g = 0xEF, .b = 0xBE,},};
    
    warn("{X8}\n", x.bytes);
}

is functionally identical to

const warn = @import("std").debug.warn;

pub fn main() void
{
    const ARGB = packed struct {
        a: u8, r:u8, g:u8, b:u8,
        
        pub inline fn bytes(self: &this) &u32 {
            return @ptrCast(&u32, @alignCast(@alignOf(u32), self));
        }
    };
    
    var x = ARGB{.a = 0xDE, .r = 0xC0, .g = 0xEF, .b = 0xBE,};
    
    warn("{X8}\n", *x.bytes());
}

Edited as it sounded a bit harsh, didn't mean to be vitriolic; just came out that way. Am sorry @bheads if you took offense, was written as I woke up :).

  • It will inline! Even if it doesn't, you can force it to inline with that keyword, which unlike C++ __inline or whatever it is called will actually inline.
  • About the whole C/C++ moving things around, I was actually wrong thanks for the correction I updated my comment to remove that bit; what I was thinking about was something different entirely, regardless it was just a completely different point and not relevant and I was just trying to help you see why your code would produce weird bytes field as they order wasn't right.
  • And as above stated, it is not just functionally identical; they are instruction identical
  • function calls aren't expensive; I don't know what position your talking from, but unless your looking at a very slow microprocessor, where you want to squeeze every last inch of performance; you simply won't care that much about function calls, the biggest cost to them is the impact on the cache imo. Regardless keep in mind I stated they weren't 'that' expensive, running anything millions of times adds up; AGAIN though as a final time, the compiler will inline it...
  • The data isn't modified, and when you use unions you however allow the data to be modified? I gave you an explicit detailing of what a cast is instruction wise (well not very detailed but to a moderate depth)
  • I don't think align/ptrcast does bit shifting, but functionally and instructionally it does the same thing as your union example :). SO if either one does casting/bit shifting so will your union. If they aren't instructionally the same then if they produce slower code either that is a bug/missed optimisation, or a valid reason why this is good; maybe go have a look :).
  • Finally, thanks for the github files; the second one just straight up doesn't use nested anonymous functions (maybe a mistake), and the first one really needs to; otherwise the naming gets all of; and if you don't want to then you could easily make a function that returns that, since casting from an array of [16] to [4][4] is possible. Again casting in this case is just a compiler thing I'm about 99% sure, it'll probably require a single instruction for size or something; but as long as you keep them as arrays till the end then make them slices everything will be fine.

Overall; maybe the IRC is more of a place to talk about inling and non relevant points as stated below let's keep this about just nested unions.

I haven't read this issue yet, I'll try to clear up the confusion when I do.

But let's remember our community motto

Be excellent to each other

@BraedonWooding No need to repeat yourself. Even if someone is nitpicking, don't accuse them of nitpicking. Just ignore it and move on.

Meta-conversation is off topic. That means this comment (the one I am typing right now) is off topic and replying to this comment is off topic.

@kyle-github Your first example is pretty neat, actually. I didn't know you could "inline" an existing struct into another in C. This C OOP is also a thing that is done in Zig code, though the pattern is a little different. Because Zig is allowed to rearrange fields of a none packed struct, casting from base to derived is not a good idea, as you have no guarantees that base is the first field of derived. Instead, Zig has the @fieldParentPtr builtin (example). This builtin require that the base field in derived have a name.

For packed structs in Zig, you OOP model would work just fine, as the order is preserved. I do however think, that even if you are using packed, you should probably still use @fieldParentPtr, as it reduces the opportunity for bugs. Rearrange any fields in your struct, and @fieldParentPtr will still work as expected. And if you ever change the type of the base field in derived, @fieldParentPtr gives a compiler error.

@fieldParentPtr, does however not give you base's fields in derived, which is the main argument for nested anonymous struct/unions.

Also, on another note. Isn't unions that represent the data in two or more forms only really a thing because it avoids C strict aliasing bugs since casting between pointer types violates strict aliasing?

Hi @Hejsil, yes the "OOP" method is heavily used in a lot of C. Often the "base" object will actually be the vtable plus some sort of type ID or something simple.

I am not sure I am tracking your point about @fieldParentPtr? Unless I am understanding the use incorrectly, this seems to going the other direction. Downcasting rather than upcasting?

The point about extensibility is that using something like enum struct is hugely more clean but it cannot be extended later. You must modify the enum directly to add another type. The C method allows you to extend a base type in a different compilation unit. It is definitely limited, but does not constrain you to modify the original code.

@kyle-github

const Base = struct {
    const Self = this;

    type_id: u64,
    next: ?&Self,

    // Could use a vtable instead.
    fooFn: fn(&Self) usize,

    fn cast(self: &Self, comptime D: type) ?&D {
        if (type_hash(D) != self.type_id)
            return null;
        return @fieldParentPtr(D, "base", self);
    }

    fn foo(self: &Self) usize {
        return self.fooFn(self);
    }
};

const Derived = struct {
    const Self = this;

    base: Base,
    data: usize,

    fn init(data: usize) Self {
        return Self { .base = {.type_id=type_hash(Self), .next=null, .fooFn=foo }, .data=data }; 
    }

    fn foo(base: &Base) usize {
        const self = @fieldParentPtr(Self, "base", base);
        return self.data;
    }
};

Struct embedding has a long history before Go made it popular. The origin seems to be the Plan 9 C compiler (GCC has it today, see https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html). If it is "standard enough" for a flavor of C to have it, it might be worthwhile to do in Zig. Then again, Zig has much better support for inheritance.

Came here to write this very proposal. I've used it in both C and Go, and it has been very useful for code clarity, in my opinion. I hope it makes it in. 😄

I think the following issue forms an argument in this discussion but I do not for which side.
When I import a C header in zig with

use @cImport({
    @cInclude("rgb.h");
});

containing following structure:

typedef struct{
    union {
        struct {
            union {
                uint8_t r;
                uint8_t red;
            };
            union {
                uint8_t g;
                uint8_t green;
            };
            union {
                uint8_t b;
                uint8_t blue;
            };
        };
        uint8_t raw[3];
    };
}  CRGB ;

I would like to be able to interface with C libraries containing this features / headers,
since one of zigs selling points is the seamless integration with existing C libraries.

Thanks @Hejsil I was glad to be pointed to this issue by @Vexu.

...some C code that is not easily mapped to Zig because of nested struct/unions. We want real world examples.

I am currently porting liburing's io_uring helpers to Zig.

I realized that mapping io_uring_sqe to Zig is almost impossible to do in a future-proof way because the kernel is adding loads of unions to the struct, and our std lib definition can't keep up without breaking all existing Zig code that references the struct using the pure u32 or u64 instead of going through the union.

i.e. Every time we update our std lib version of io_uring_sqe to add a new union to match the kernel, we break all Zig code that uses the older version of the struct.

Real world code example is here: #6349

This feature could use this syntax:

const U = extern union {
    full: u32,
    usingnamespace _: extern struct {
        a: u8,
        b: u8,
        c: u8,
        d: u8
    },
};

There is a parallel discussion in #6349

Edit: After further consideration, we have overturned this decision. The new decision is here.
What follows is the original text of this comment.


After discussing at this week's design meeting (and last week's, and the week before that), we've decided to accept some parts of this proposal. Anonymous struct/union fields are standardized in both C11 and Rust's FFI layer. It’s clear that this is a way that both rust and C have decided it’s useful to lay out memory on ABI boundaries. Since FFI is an important use case, it’s important that we have good support for it. This feature does add a little bit of language complexity, but it’s very limited, and it makes FFI code less surprising (no mysterious generated field names). See also the Rust RFC for this feature, which is similarly motivated.

What is accepted:

Within the field list of a (packed or extern) (struct or union), you may open a scope with the keyword struct or union. For example,

const Foo = extern struct {
    a: u32,
    union {
        b: u32,
        struct {
            c: u16,
            d: f16,
        },
        e: f32,
    },
    f: u64,
};

Despite having similar syntax, these inner scopes are not type literals. They do not allow decls or extern/packed modifiers, and they do not generate separate type info. All they allow is the ability to change between struct and union layout in the middle of the fields list.

When initializing this object with a struct initializer, the compiler validates that every field is initialized, but no overlapping fields are initialized. So these are valid initializers:

_ = Foo{ .a = 0, .b = 1, .f = 2 };
_ = Foo{ .a = 0, .c = 1, .d = 2, .f = 3 };

But these are not:

_ = Foo{ .b = 1, .f = 2 }; // missing field a
_ = Foo{ .a = 0, .b = 1, .c = 2, .f = 3 }; // cannot initialize both b and c
_ = Foo{ .a = 0, .c = 1, .f = 2 }; // must initialize c and d together
_ = Foo{ .a = 0, .f = 2 }; // must initialize b, e, or both c and d

Default values are allowed only on fields which are not beneath a union. So in the Foo example, only a and f may have default values. b and e are union members, and c and d are descended from union members, so none of them may have defaults.

For details on why we did not accept the other parts of this proposal, see #5561 (comment) and #1214 (comment).

(edit: fix circular references in linked comments)

they do not generate separate type info

How will this be exposed in TypeInfo?

The immediate plan is that it will not, all of these fields will show up in the type info, and @offsetOf can be used to determine overlap. However proposals are welcome for better ways to do this.

Default values are allowed only on fields which are not beneath a union.

Why this limitation? We could use the same rule as for initializers (i.e. allow default values inside unions, as long as none of them overlap).

The problem is that there is no way to specify which union tag to fill, and filling that in is well defined but feels like it defeats the goals of the initializer field checks. A couple examples:

const X = extern struct {
  union {
    b: u32,
    f: f32 = 1.0,
  },
};
X{ }; // this is weird, we don't allow this in normal extern unions.

const Y = extern union {
  struct {
    a: u32,
    b: f32 = 1.0,
  }, struct {
    c: f32 = 1.0,
    d: u32,
  },
};
Y{} // combines defaults from multiple union members, but they don't overlap.

Small nit about the approved syntax, I find the lack of field name a bit disturbing since it breaks the name: type pattern.
I'd go the Rust route and use the now-invalid _ to mark the nested anonymous fields.

  • The declaration syntax is regular again,
  • _ has no name, it's perfect for anonymous fields,
  • _ is already used in the definition of non-exhaustive enums, that sets a precedent for the use of _

The following code is IMO more readable and less foreign-looking (less chances of being asked what is this stray struct/union doing here?):

const Foo = extern struct {
    a: u32,
    _: union {
        b: u32,
        _: struct {
            c: u16,
            d: f16,
        },
        e: f32,
    },
    f: u64,
};``

The case against the underscore syntax is that it makes the scope look like a type literal, which it is not. These are all compile errors:

const T = union { ... };
const Foo = extern struct {
    a: u32,
    _: T, // reference to existing type, rejected in #5561 and #1214
};
const Foo = extern struct {
    a: u32,
    _: union(enum) { ... }, // doesn't create a type, this makes no sense
};
const Foo = extern struct {
    a: u32,
    _: union {
        // cannot contain decls, this is not a type.
        pub const Self = @This();
    },
};

And this one compiles but is misleading:

const Foo = extern union {
    _: struct {
        // @This() here return Foo, not the anonymous scope
        x: [*:0]const u8 = @typeName(@This());
    };
};

The case against the underscore syntax is that it makes the scope look like a type literal, which it is not.

That's the idea, treat the RHS as we already do and parse it as a type literal. Forbidding it from having any declaration can be safely done during the semantic analysis phase with ease.

I don't mean to imply that it's difficult, I just mean that it's a surprising limitation. Everywhere else in the language, when you have an expression of a certain type, you can replace that with a comptime known variable of that type. This feature doesn't work like that.

The other benefit to doing this with a new syntax is that errors are caught in the parser, rather than in the compiler, so you will see these problems in code that is not semantically analyzed.

I don't mean to imply that it's difficult, I just mean that it's a surprising limitation. Everywhere else in the language, when you have an expression of a certain type, you can replace that with a comptime known variable of that type. This feature doesn't work like that.

By using struct and union you're doing the same, no? They have a well defined meaning and behaviour except when they're thrown inside a field definition list.

The other benefit to doing this with a new syntax is that errors are caught in the parser, rather than in the compiler, so you will see these problems in code that is not semantically analyzed.

Meh, I'm all for letting the parser... Parse? And analyze the code somewhere else.

By using struct and union you're doing the same, no? They have a well defined meaning and behaviour except when they're thrown inside a field definition list.

Yes but in this case the syntax is unique. _: T looks like something that you would expect to work a certain way, but it's a trap.

When you see the bare struct/union you're thinking, wtf there should be an identifier there. And that wtf is what clues you into the fact that there's something fundamentally different happening here.

One thing to note, we may want to disallow an embedded struct/union if we are not moving from one to the other. So we may want to make the following a compile error:

struct {
    a: u32,
    // it's nonsensical to embed an anonymous struct inside another struct
    struct {
        b: u32,
    },
}

union {
    a: u32,
    // it's nonsensical to embed an anonymous union inside another union
    union {
        b: u32,
    },
}

By using struct and union you're doing the same, no? They have a well defined meaning and behaviour except when they're thrown inside a field definition list.

Yes but in this case the syntax is unique. _: T looks like something that you would expect to work a certain way, but it's a trap.

That's the same trap you fall into with the proposed syntax, if I see struct { ... } I wouldn't expect it to be a limited version of the type declaration sporting the same name.

When you see the bare struct/union you're thinking, wtf there should be an identifier there. And that wtf is what clues you into the fact that there's something fundamentally different happening here.

Beside my subjective opinion on the ugliness of this special case I think _: ... expresses much better what's being done: you're defining a brand new unnamed field with the specified type. A stray struct or union feels like a type definition tacked in the middle a list of fields.

@marler8997 Er... but isn't that exactly what C does? It is a very common idiom in C to embed an anonymous struct within a struct as a form of "subclassing". Translating that into Zig means handling that. I see structs within structs all the time (and use it somewhat often myself). I cannot think of a time when I saw a union within a union. But unions within structs? All the time. Especially with embedded or memory-tight systems.

My understanding from what @SpexGuy wrote above is that the main driver here is interop with C.

Again, thank you @SpexGuy, @andrewrk and all for having such an open conversation. I learn so much from reading about the reasoning for these decisions!

The new syntax conflicts with accepted #4335.

@kyle-github...I see structs inside structs all the time as well (and unions within structs) but I don't recall ever seeing an anonymous struct inside another struct. I'm also not familiar with what you mean by "subclassing" here. Could you show me an example of where you've seen this and clarify what you mean by subclassing? Or at least create a made up example that shows how an anonymous struct directly inside another struct provides some sort of benefit? As far as I can tell, inserting/removing an anonymous struct that is directly within another struct is completely equivalent. Thanks, would be interesting to learn something new.

@marler8997 This is a contrived example.

struct base {
    struct base *next;
    int id;
};

struct derived {
    struct base;  /* anonymous */
    struct window_position pos;
};

The struct derived gets the next and id fields from struct base. This is a form of inheritance. You can get some of the same effect by naming the base field in the derived struct, but then you no longer have a next or id field directly in it. If I have a pointer to derived I can access all the fields as if they were at the top level of the struct:

struct derived *dp = ...;
dp->pos.x = 42;

if(dp->id == MY_NIFTY_ID) {...} /* field 'id' is in the base struct */

dp = list;
while(dp) {
    draw_window(dp);
    dp = dp->next; /* field 'next' is in base */
}

By using anonymous types like this you avoid some casting (so a bit safer) and your code is a bit more generic since access to common elements is done the same way regardless of the type of the derived struct. The difference between anonymous structs and named structs is not great, but it is used in situations like this.

Another example: variant records in C using anonymous unions:

struct variant {
    int variant_type;
    union {
        int i_val;
        float f_val;
        void *p_val;
    };   /* anonymous */
}

struct variant *var = ...;
if(var->variant_type == INTEGER) {
    var->i_val = 1337; /* anonymous access to sub-field */
}

Here the variant struct looks like it directly has fields i_val, f_val, and p_val.

Zig's facilities are much, much nicer here! The only reason I see for accepting this is for translation of C code. The resulting Zig code will be closer to 1-to-1 with the original C code.

I hope that gives you some ideas of how this is used in C.

@marler8997 This is a contrived example.

struct base {
    struct base *next;
    int id;
};

struct derived {
    struct base;  /* anonymous */
    struct window_position pos;
};

In what version of C is this valid? I know it isn't valid in C11, as it gives a declaration does not declare anything [-Wmissing-declarations] warning.

@kyle-github, I'm very confused by your example. I've never heard of a C feature that would allow you to embed a struct "anonymously" inside another struct that is declared elsewhere, nor does this accepted Zig proposal allow that. If either language supported this then you would have a use case for supporting structs inside structs, but I've never heard of such a feature. Where did these semantics come from? I tried compiling this and I get semantic errors that I would expect.

I've never heard of a C feature that would allow you to embed a struct "anonymously" inside another struct that is declared elsewhere.

This is Plan9 style struct embedding, already discussed and rejected in #1214.

Oh wow I've never heard of this feature before. Given this new knowledge, the question that comes to mind is what is the plan for handling this feature in translate C? Will Zig just assert an error if it encounters C code using this Plan9 style of anonymous struct embedding?

@marler8997, @zigazeljko there are two things being conflated here. IIUC, this embedding is NOT the Plan 9 version. It is the MS version. See the GCC docs on this. The first part of that describes how anonymous structs and unions work and then describes the Plan 9 extension on top of that (which is considerably more complicated and approaches C++ multiple inheritance for performance impact).

I am only talking about the MS extension, definitely not the Plan 9 extension! I do not know where the extension idea came from originally, but even GCC bundles it under ms-extensions. It is not a strict part of C11 or C17 (note that the GCC docs say that it is allowed by C11), but is definitely used in enough code that GCC, Clang and MSVC all support it.

I do see the MS extension used with some frequency. I almost never see the Plan 9 extra extensions used.

I probably muddied the waters with my examples because I was following a pattern from old Amiga coding where the base struct was always at the start of the derived struct so that a pointer to derived can be cast to a pointer to the base and vice versa. Sorry about that!

According to GCC docs you linked, it says the -fms-extensions just allows you to include struct tags, i.e.

struct {
    struct my_tag {
        int a;
    };
};

I don't see where it says the ms-extensions allow you to do this:

struct foo { int a; }
struct bar { struct foo; }

struct bar b;
b.a;

The Plan 9 compiler lists it as their extension. Section 3.3

@Nairou thanks for looking that up. @marler8997 here is a sample:

#include <stdio.h>

struct foo {
	int bar;
};

struct blah {
	int snork;
	struct foo;
};

int main() {
	struct blah blah_var;

	blah_var.bar = 42;

	printf("blah.var=%d\n", blah_var.bar);

	return 0;
}

Compile and run it:

$ gcc -fms-extensions -o embedded_test embedded_test.c 
$ ./embedded_test 
blah.var=42

Here is what happens when you modify the C code to use the Plan 9 extensions.

Here's the code:

#include <stdio.h>

struct foo {
	int bar;
};

struct blah {
	int snork;
	struct foo;
};


int main() {
	struct foo *foo_ptr;
	struct blah blah_var;

	blah_var.bar = 42;

	foo_ptr = &blah_var;

	printf("foo_ptr->bar=%d\n", foo_ptr->bar);

	return 0;
}

Try to compile with just the MS extensions:

$ gcc -fms-extensions -o embedded_test embedded_test.c 
embedded_test.c: In function ‘main’:
embedded_test.c:19:10: warning: assignment to ‘struct foo *’ from incompatible pointer type ‘struct blah *’ [-Wincompatible-pointer-types]
   19 |  foo_ptr = &blah_var;
      |          ^

Try again with the Plan 9 extensions:

$ gcc -fplan9-extensions -o embedded_test embedded_test.c 
$ ./embedded_test 
foo_ptr->bar=42

I definitely do not see the Plan 9 extensions used often. Thankfully. Being able to cast the pointer like that seems to have the performance impact of C++ multiple inheritance with few of the advantages.

Ok thanks @kyle-github, I see where I missed that from the GCC docs:

If -fms-extensions is used, the field may also be [...snip...] a reference to a previously defined structure or union ...

So now that we've established where these semantics come from, the question is what will Zig do with these when it encounters them during translate C? What's the plan?

Since this was accepted, some good concerns have been raised. We revisited this at a recent design meeting.

Most notably, the previously accepted version of this proposal did not play well with type info. Type info is built for reflection, and makes a separation between unions and structures. The idea behind this separation is that reflection code should be able to assume that all members of a struct are simultaneously valid and do not overlap, and only one member of a union is valid at a time. The implementation of this idea that plays well with type info is #5561.

However, after examining the use cases for this feature, we no longer believe it is worth the complexity. cImport should not decide the direction of the language. We make an effort to support C, but the C features are meant to be an optional extension to the language. We should be able to cleanly remove the C features from the language, and be left with something that is complete. #5561 can’t be reasonably restricted to only extern structs. If we implemented that, it would be expected to work on bare structs as well. But we’ve rejected all use cases of this feature that are not C interop, so it’s not worth including this in the non-C part of the language. And since there would be dissonance in having this feature work only on extern data types, this should not be included in the language. While we don’t necessarily reject the validity of the C interop use case, we do not see a path to supporting it that is worth the complexity it would introduce.

Another important feature of the design of Zig is that it is meant to have a “normal form”, in the mathematical sense. There shouldn’t be multiple styles of zig code, and there shouldn’t be features that some people decide to use and others don’t. Zig should look pretty much the same regardless of the problem domain it’s tackling, and regardless of the person writing the code. This is a stronger argument against this feature, which would create a new tradeoff to consider that is purely stylistic. We can completely remove this mental overhead by not having this feature.

Zig's stance on syntactic sugar is this: If there is a common pattern, where the best way to do it is unappealing, and the appealing way has footguns or drawbacks, that’s a good argument that we should introduce sugar to make the better way easier. A classic example here is that we used to have an unwrapping operator, like .?, for error unions. We removed it because it made the “bad” thing of transforming error codes into UB much easier than handling errors. It was replaced it with try, which makes propagating errors up the stack (the safer approach) easier.

Applying this reasoning to field inclusion, the behavior it encourages is not necessarily better or safer. It’s more complex, and it introduces the footgun of accidentally clobbering fields of a struct which overlap.

So we've decided to reject this proposal. We apologize for the whiplash, and we'll make a stronger effort going forward to avoid accepting and then rejecting issues.

floooh commented

TL;DR: basically #19660 :)

Just as an additional data piece, I'm currently also missing unnamed nested packed structs and unions in my experiments to map a computer emulator system bus to a packed struct where each bit represents a connection between system chips, but where the same bits should be accessible under different names, or groups of bits either as an unsigned integer or individual bits.

Currently I'm doing this with the system bus being an unsigned integer and regular bit twiddling.

Technically I can mostly achieve what I want with packed structs, but when each nested packed struct or union requires a name it leads to very long chained access code, like:

const d0 = bus.cpu.dbus.bits.d0;
const dbus = bus.ctc.dbus.val;

...for what could be:

const d0 = bus.d0;
const dbus = bus.db;

...and sometimes I also need to come up with pointless placeholder names just because I want to have the same bit accessible under two different names.

IMHO the _ placeholder would be a good solution that fits into the rest of Zig:

const Bus = packed union {
    _: packed struct {
        _: packed union {
            db: u8,
            _: packed struct {
                d0: u1,
                d1: u1,
                // ...
            },
        },
        _: packed union {
            m1: u1,
            cs: u1,
        },
        mreq: u1,
        iorq: u1,
    },
};

...at the same time this _ placeholder could/should be used to define unnamed padding items.

...I think the general idea of having differently named and typed 'views' onto the same bits in packed struct is a very useful thing to have. Maybe unnamed nested structs and unions are just the wrong approach because of its 'C/C++ baggage', and a different language feature makes more sense for that same idea.

For the same reason it would be nice if packed structs could contain arrays, like [8]u1.

Of course I could also just always fall back to regular bit twiddling on unsigned integers, but packed structs are so close to replace bit twiddling code to something a lot more convenient and readable.