ziglang/zig

Optimization: allow inlining of tags into anonymous structs of a tagged union

mnemnion opened this issue · 6 comments

I don't think this qualifies as a proposal, because it wouldn't change the semantics
of the language. If that's wrong, my apologies.

I've been working on translating a VM into Zig, and ran into the following:

const std = @import("std");

const Tag = enum(u8) {
    fee,
    fie,
    foe,
    // ...
};

const UnderPacked = union(Tag) {
    fee: f32,
    fie: struct {
        a: i32,
        b: u8,
        c: u8,
    },
    foe: void,
};

// extern to demonstrate that the tag can be first
const FiePacked = extern struct {
    t: Tag,
    b: u8,
    c: u8,
    a: i32,
};

test "print sizes" {
    std.debug.print("\nSize of Underpacked: {d}\n", .{@sizeOf(UnderPacked)});
    std.debug.print("Size of FiePacked: {d}\n", .{@sizeOf(FiePacked)});
}

The first size is 12, the second is 8.

I believe I understand what's going on here. The tag has to come first in tagged
unions, for good and suffient reasons, and also, a struct must have the alignment of
its largest member. The data size of the .fie struct is 6, but the i32 has
alignment requirements which forces two bytes of padding, and an alignment of four
for the pointer. That in turn forces another three bytes of padding between the tag
and the struct.

However, it should be legal to inline the tag, if the struct is anonymously defined
inside the union. FiePacked is a demonstration of what that would look like. A
careful eye will notice that FiePacked has an unused byte, and because of C
alignment rules, that byte is after c.

I would bet that the Zig compiler in its current form will always put one of the
accessible fields of a struct at the 0 offset, since there's never a good reason not
to. But switching on the tag is a good reason! You can't even get at the .fie
variant of the union without touching the tag in some fashion.

This doesn't actually have to be limited to anonymous structs defined inside the
union, although maybe it should be.

For example, a definition like this:

const OuterFie = struct {
    a: i32,
    b: u8,
    c: u8,
};

const OuterUnion = union(Tags) {
    fee: f32,
    fie: OuterFie,
    foe: void,
};

Could, if within a single compilation unit, result in OuterFie getting packed like this:

struct {
    _: u8,
    b: u8,
    c: u8,
    _: u8,
    a: i32,
}

The size remains the same, since it's dictated by the i32. Obviously there are
options as to where the other byte of padding could go, I put it in the same place as
the invisible padding of FiePacked.

I see some potential problems with this. The main one is that layout of all structs
would have to be deferred until the compiler has looked inside every tagged union,
making something which is currently entirely local into a whole-program analysis.

There may also be efficiency reasons to have the pointer address at offset 0 point to
meaningful data. It's not clear how pronounced this effect would be, or if it would
even show up at all, since structs don't really exist in machine code. But stored
pointers to structs certainly do, so I doubt the performance impact of keeping all
useful data at an offset to those pointers would be zero.

However, for an anonymous struct defined within the union, like the first example, I
would say it's a pretty clear win to consider the tag as a potential field within the
struct's layout, one which has to be the first field, but is considered in laying
out the rest of the struct. This would result in UnderPacked.fie being laid out
identically to FiePacked, with the difference that there's no t field to access
once the union instance has been switched on.

The effect on layout of adopting the anonymous-only version of this proposal should
be minimal, the effect on codegen might also be minimal, and it might not be. When a
compiler can implicitly rely on the zero offset of a struct having some sensible
value, it's easy for that assumption to show up in the implementation.

But the feature would be an immediate win for me. I'd rather not implement the
opcodes as a bunch of distinct structs in an anonymous union, because at that point
I'm using extern to make sure that the distinguishing field is in the same place in
memory, I lose type safety, comptime potential, generic dispatch, everything which
makes tagged unions nice.

Apologies if I didn't fully understand the issue, however I do want to point out two things.

First, packed unions exist:

const UnderPacked = packed union {
    fee: f32,
    fie: packed struct {
        a: i32,
        b: u8,
        c: u8,
    },
    foe: void,
};

The size of this data structure is 8.

Second:
Actually ignore my second point, I was wrong here.

For your example, it isn't necessary to use the packed keyword: an ordinary union will also have size 8.

This issue relates to tagged unions, not bare unions. For a tagged union, the enum is part of the data structure and can be used for dispatch at runtime via switch statements. For a bare union, the author is responsible for tracking which arm of the union is active, accessing the wrong one is safety-checked undefined behavior, and it isn't possible to dispatch on the tag at runtime, since it doesn't exist in the resulting program.

You reminded me of something I noticed and forgot to include in the issue, which is that this version:

const UnderPacked = union(Tag) {
    fee: f32,
    fie: packed struct {
        a: i32,
        b: u8,
        c: u8,
    },
    foe: void,
};

Actually has a size of 16, not 12. So using the packed keyword internally on the anonymous struct actually makes things worse.

This also has an explanation: the least size of backing integer which can support the packed struct is 8, since the CPU has no native u48 type. So it gets word alignment, leaving the u8 enum with 7 bytes of padding.

As I understand it, it wouldn't be legal for the compiler to inline the tag into a packed struct, because packed guarantees the indicated memory layout, meaning that the first field must be at the zero offset.

Vexu commented

Does #1922 cover your use case?

@Vexu As far as I understand, this issue post points out an improvement to be employed automatically by the compiler, without requiring the user to manually/explicitly think about the layout.

@rohlem that's correct, this would automatically rearrange the struct type to have padding equal to the tag width as the first field, when otherwise allowed (so neither extern nor packed structs would be eligible). I'll also add "and when this can result in a tighter struct packing", but I don't think there are circumstances when it wouldn't (open to counterexamples here). That "padding" would be used for the enum, but would not be a part of the anonymous struct's type.

In the comment about how unions lay out a packed struct, I considered adding a few words about how it would be nice to be able to define the tag as being part of the packed struct, which is more #1922 related. Such a feature would be useful for extern structs as well. But that's a distinct proposal, because it would need some language feature to be able to specify "this enum-tagged field at the start of the packed struct refers to the enum in the tagged union", because a rule like "if the enum in the tagged union is the first field of the packed struct, that field will contain the union tag" goes against the semantics of a packed struct, and would preclude having both the union tag, and the same enum repeated as the first field of the packed struct. Doing something implicit for a packed construct would be wrong, but using Zig-style structs already gives control of field layout to the compiler, so the proposed optimization of tagged unions with anonymous Zig-style structs should be legal to apply within the existing language semantics.

Also, #1922 is about putting an enum field in a consistent location within a bare union, not a tagged union. I would suggest that improving the automatic packing of tagged unions would make this less useful/necessary. It isn't a complete replacement, because this still requires that the tag will be at the zero offset, and a user might want the tag elsewhere for e.g. FFI reasons.

A suggestion which is related to the proposal is that the documentation and specification of Zig should record that the tag in a tagged union will always be at the zero offset. If core would prefer to reserve the right to put the tag somewhere else, that's understandable. I don't know for a fact that the tag is at the zero offset in all cases, either, all I do know is that this would explain the behavior I'm seeing, and also that it's the obvious thing to do for a variety of reasons.

Edit: this comment in #1922 suggests that Zig currently puts the tag at the end of the union, not the beginning. I don't know if that's still current, but it's worth pointing out that which end gets chosen is irrelevant to this proposal: according to classic C struct packing doctrine, either ascending or descending size order for struct fields will achieve an optimal layout. I'm by no means expert at codegen, but it strikes me as disadvantageous if the tag is more than one machine word separated from the zero offset, which might be a good reason to favor the zero offset for large structs, or indeed in general. I'm open to correction on that, or indeed, anything else I've said.