ziglang/zig

enums should disallow multiple enumerations with the same value

tiehuis opened this issue ยท 17 comments

@andrewrk: I re-opened this to propose reversing the decision


A C enum allows multiple values. e.g.

enum Foo {
    OPTION1 = 0,
    OPTION2 = 1,
    DEFAULT = OPTION1,
};

However, zig extern enum's do not (note: we cannot declare a member based on another member, another possibly improvement)

const Foo = extern enum {
    Option1 = 0,
    Option2 = 1,
    Default = 0,
};

test "asd" {
    const a = Foo.Default;
}

Fails with compilation

/tmp/t.zig:4:15: error: enum tag value 0 already taken
    Default = 0,
              ^
/tmp/t.zig:2:15: note: other occurrence here
    Option1 = 0,
              ^

We should allow extern enums to have the same tag value for multiple members. The specific use-case here would be for translate-c. Encountered this when trying to process gmp.h which has this enum present:

/* Available random number generation algorithms.  */
typedef enum
{
  GMP_RAND_ALG_DEFAULT = 0,
  GMP_RAND_ALG_LC = GMP_RAND_ALG_DEFAULT /* Linear congruential.  */
} gmp_randalg_t;

Alternatively for translate-c enums could be translated as sets of constants but this feels wrong. There are possibly some questions regarding switch cases and handling of duplicate values there that I haven't considered.

I'll add something related to this, that I've hit before.
C users also uses enums for flags, so sometimes you'll see this in headers:

enum flags {
    A = 1 << 0,
    B = 1 << 1,
};
enum flags all = A | B;

translate_c will also fail to translate something like this if all is ever used.

It's possible to do this using definitions:

pub const Compression = extern enum(u32)
{
    Uncompressed    =   0,
    RLE8            =   1,  //8bpp only
    RLE4            =   2,  //4bpp only
    Bitfields       =   3,  //Windows, 16bpp/32bpp
    
    //These seem to only exist for passthrough to
    // devices like printers and don't actually
    // exist in .bmp files
    JPEG            =   4,  //Windows v3
    PNG             =   5,  //Windows v3
    
    //This is legit
    AlphaBitfields  =   6,  //Windows v4
    
    //I think these only show up in Windows Metafiles,
    // not actual .bmp files
    CMYK            =   11,
    RLE8CMYK        =   12,
    RLE4CMYK        =   13,
    
    //These only appear in OS/2 files
    pub const Huffman1D = @This().Bitfields; //OS/2, Monochrome only
    pub const RLE24   = @This().JPEG;        //OS/2, 24bpp only
};

However the definitions don't work as enum literals.

I think that C enums should be converted to:

pub const mylib_enumname: type = struct {
    pub const A: comptime_int = 0;
    pub const B: comptime_int = 1;
    // ...
};

Since that matches how C sees them. At the moment there are plenty of C headers that cannot be imported without modification.

This also shows up in clang-c/libclang https://clang.llvm.org/doxygen/Index_8h_source.html in the CXCursor enumeration.

I've been thinking about the current definition of extern enum, and I've come to the conclusion that @cImport should never use it. Here's a summary of my thoughts:

Zig's C translator replaces enums with integer types of the correct size. The constant values, however, are generated as extern enum. At first glance, this means that for all code that touches the external API, the zig code must use @enumToInt when passing values and @intToEnum when receiving values. But there's a problem that arises here related to Zig's strict rules about enum semantics. Consider the case where a C function returns an error code of an enumerated type. The temptation here is to write switch(@intToEnum(returnedValue)). But this approach has the potential to introduce dangerous undefined behaviour in the event that the C API returns an unnamed value. Such a case is totally valid according to the C API, and can happen easily if a DLL is updated. This means that the generated enum values can never be used safely without wrapping them in @enumToInt. Since Zig is trying to 'beat C at its own game', this presents an ergonomics problem that should be addressed. I see two valid fixes:

  • Adopt @Vurich's suggestion to compile enums to namespaced comptime_int values. This would allow us to define the C enum type as an integer type of a known size, which would also make dealing with flags much easier. This also allows us to put the "enum type" in structs and function prototypes, improving readability.
  • Expand the definition of extern enum (or make a new c_enum) to match the C definition. This would mean that extern enums are allowed to take on values that are not named, and switches on them must always specify an else case. We could also expand a step further and define integer operations on extern enums (but not implicit conversions). This would give us a chance to provide some improved syntax for flags, such as const value: MyEnumFlags = .ROOT_SET_BIT | .IGNORE_BIT;

Personally I like the second option better, but either of these would improve ergonomics and make interfacing with C code safer.

The temptation here is to write switch(@intToEnum(returnedValue)). But this approach has the potential to introduce dangerous undefined behaviour in the event that the C API returns an unnamed value.

@intToEnum should have checked behaviour (you should get the error "has no tag matching integer value"); if you expect such a thing to occur, that's the usecase for non-exhaustive enums

The behaviour is checked, but only in debug and safe builds. Mismatched dll versions are the sort of thing that happens often in production and very rarely during development.

Ah, I hadn't seen #2524. Having @cImport generate non-exhaustive enums solves the UB problem, but there is still a usability issue with C APIs that use flags requiring @enumToInt and @intToEnum. Defining bitwise operations on non-exhaustive enums of the same type might help but that's not really in line with the intended semantics of non-exhaustive enums. I think I'm still looking for a slightly different use case to make dealing with imported C APIs less painful.

Vexu commented

I think the best way extern enum can work is @SpexGuy's second option with integer operations and exhaustive enums.

Pros:

  • Type safety of Zig enums. If you want to pass an extern enum to an integer you woukd still need to use @EnumToInt.
  • Easier interop with c code. Integer operations between same type extern enum are allowed.
  • translate-c can be simpler.

Cons:

  • Someone might abuse it somehow?

I have enums implemented in translate-c-2 in a way that would benefit from this proposal getting accepted.

Vexu commented

I think the best way extern enum can work is ...

Implemented here Vexu@7353e17 for reference.

Here's what's accepted:

  • extern enum to allow multiple fields with the same value.
  • Expand the definition of extern enum (or make a new c_enum) to match the C definition. This would mean that extern enums are allowed to take on values that are not named, and switches on them must always specify an else case.

What I'm not convinced of:

  • Allowing integer operations on extern enums.

Note that extern enums do not necessarily mean "compatible with the semantics of C" - they mean "compatible with the ABI of C". Enums are not a valid way to represent flags. An enum is a scalar value; flags is a set of booleans. As far as what this means for translate-c, I think the @enumToInt / @intToEnum conversions will be fine. The messiness is a good signal that this code could be better if it were converted to zig types properly, e.g. a struct with align(0) bools. One thing to note is that @intToEnum for extern enums would have no safety check, since an extern enum would allow values that are not named.

"Beating C at its own game" -> as long as the .h file does proper C types. Using enums for flags even in C is an abuse of the type system, and so I'm happy with that being awkward to deal with. The awkwardness is correct.

This is now implemented and in master branch. However, note the bug #3991

I re-opened this to push back on this decision. Here is my proposal:

  • Delete packed and extern enums from the language. Supplying an integer tag type guarantees the memory layout of the enum.
  • translate-c: switch to translating C enums into integers and constants. This matches how C treats enums, and solves a bunch of awkwardness trying to use translated C enum code in zig. Remember that Zig exists first and foremost as its own language, not to be a servant to C code.

Reasoning: nearly all the instances of extern enum in the standard library today are incorrect. It is not obvious that one should choose a regular enum with an explicit tag type over an extern enum.

There are real consequences for field aliases; for example trying to use std.enums.directEnumArray on an extern enum tries to run O(N^2) logic at compile time due to the possible existence of field aliases.

Enums with field aliases are a different mathematical entity, one that is more complicated to reason about, one that requires costly compromises in the compiler implementation, and one that should not be in Zig.

People mentioned enum flags being used in C. Here's an idiom I've been using when interacting with certain OS data structures whose enums have certain values based on other enum values:

/// Page protection flags
pub const Protection = EnumFromDecl(vm_prot_t, false, struct {
    pub const READ: u32 = 0x01;
    pub const WRITE: u32 = 0x02;
    pub const READ_WRITE: u32 = READ | WRITE;
    ...
});

This mirrors the C definition/documentation nicely, while allowing use-sites to switch on the enum:

switch (seg64.initprot) {
    ...
    .READ_WRITE => std.debug.print("Read-write case\n", .{}),
    ...
}

EnumFromDecl just walks through the declarations and produce enum fields, basically the inverse to EnumFieldStruct which already exists in the std library.

The bool parameter determines if the resulting enum should be exhaustive or not.

@cryptocode, enum flags should be represented using packed struct.

pub const Protection = packed struct(u32) {
    read: bool, // this is the LSB
    write: bool, // and the next bit
    execute: bool, // etc
    _: u29 = 0, // remaining (padding) bits
};

@mlugg yeah most of time either that or just constants. But sometimes you really do want an enum, it just so happens that some fields are based on others. That is, on use-sites you wanna switch on the named combinations rather than inspecting individual flags. Narrow usecase, but when it fits it's really nice as you get the benefits of using switch (exhaustiveness being one)

Not proposing any changes to language or std, just sharing an idiom that's been useful, say, when all named flag combinations must be handled.