enums should disallow multiple enumerations with the same value
tiehuis opened this issue ยท 17 comments
@andrewrk: I re-opened this to propose reversing the decision
A C enum allows multiple values. e.g.
enum Foo {
OPTION1 = 0,
OPTION2 = 1,
DEFAULT = OPTION1,
};
However, zig extern enum
's do not (note: we cannot declare a member based on another member, another possibly improvement)
const Foo = extern enum {
Option1 = 0,
Option2 = 1,
Default = 0,
};
test "asd" {
const a = Foo.Default;
}
Fails with compilation
/tmp/t.zig:4:15: error: enum tag value 0 already taken
Default = 0,
^
/tmp/t.zig:2:15: note: other occurrence here
Option1 = 0,
^
We should allow extern enums to have the same tag value for multiple members. The specific use-case here would be for translate-c. Encountered this when trying to process gmp.h
which has this enum present:
/* Available random number generation algorithms. */
typedef enum
{
GMP_RAND_ALG_DEFAULT = 0,
GMP_RAND_ALG_LC = GMP_RAND_ALG_DEFAULT /* Linear congruential. */
} gmp_randalg_t;
Alternatively for translate-c enums could be translated as sets of constants but this feels wrong. There are possibly some questions regarding switch cases and handling of duplicate values there that I haven't considered.
I'll add something related to this, that I've hit before.
C users also uses enum
s for flags, so sometimes you'll see this in headers:
enum flags {
A = 1 << 0,
B = 1 << 1,
};
enum flags all = A | B;
translate_c will also fail to translate something like this if all
is ever used.
It's possible to do this using definitions:
pub const Compression = extern enum(u32)
{
Uncompressed = 0,
RLE8 = 1, //8bpp only
RLE4 = 2, //4bpp only
Bitfields = 3, //Windows, 16bpp/32bpp
//These seem to only exist for passthrough to
// devices like printers and don't actually
// exist in .bmp files
JPEG = 4, //Windows v3
PNG = 5, //Windows v3
//This is legit
AlphaBitfields = 6, //Windows v4
//I think these only show up in Windows Metafiles,
// not actual .bmp files
CMYK = 11,
RLE8CMYK = 12,
RLE4CMYK = 13,
//These only appear in OS/2 files
pub const Huffman1D = @This().Bitfields; //OS/2, Monochrome only
pub const RLE24 = @This().JPEG; //OS/2, 24bpp only
};
However the definitions don't work as enum literals.
I think that C enums should be converted to:
pub const mylib_enumname: type = struct {
pub const A: comptime_int = 0;
pub const B: comptime_int = 1;
// ...
};
Since that matches how C sees them. At the moment there are plenty of C headers that cannot be imported without modification.
This also shows up in clang-c/libclang https://clang.llvm.org/doxygen/Index_8h_source.html in the CXCursor enumeration.
Encountered this when creating bindings for Godot script API: https://github.com/GodotNativeTools/godot_headers/blob/master/nativescript/godot_nativescript.h#L45
I've been thinking about the current definition of extern enum
, and I've come to the conclusion that @cImport
should never use it. Here's a summary of my thoughts:
Zig's C translator replaces enums with integer types of the correct size. The constant values, however, are generated as extern enum
. At first glance, this means that for all code that touches the external API, the zig code must use @enumToInt
when passing values and @intToEnum
when receiving values. But there's a problem that arises here related to Zig's strict rules about enum semantics. Consider the case where a C function returns an error code of an enumerated type. The temptation here is to write switch(@intToEnum(returnedValue))
. But this approach has the potential to introduce dangerous undefined behaviour in the event that the C API returns an unnamed value. Such a case is totally valid according to the C API, and can happen easily if a DLL is updated. This means that the generated enum values can never be used safely without wrapping them in @enumToInt
. Since Zig is trying to 'beat C at its own game', this presents an ergonomics problem that should be addressed. I see two valid fixes:
- Adopt @Vurich's suggestion to compile enums to namespaced comptime_int values. This would allow us to define the C enum type as an integer type of a known size, which would also make dealing with flags much easier. This also allows us to put the "enum type" in structs and function prototypes, improving readability.
- Expand the definition of
extern enum
(or make a newc_enum
) to match the C definition. This would mean thatextern enum
s are allowed to take on values that are not named, and switches on them must always specify an else case. We could also expand a step further and define integer operations on extern enums (but not implicit conversions). This would give us a chance to provide some improved syntax for flags, such asconst value: MyEnumFlags = .ROOT_SET_BIT | .IGNORE_BIT;
Personally I like the second option better, but either of these would improve ergonomics and make interfacing with C code safer.
The temptation here is to write
switch(@intToEnum(returnedValue))
. But this approach has the potential to introduce dangerous undefined behaviour in the event that the C API returns an unnamed value.
@intToEnum
should have checked behaviour (you should get the error "has no tag matching integer value"); if you expect such a thing to occur, that's the usecase for non-exhaustive enums
The behaviour is checked, but only in debug and safe builds. Mismatched dll versions are the sort of thing that happens often in production and very rarely during development.
Ah, I hadn't seen #2524. Having @cImport
generate non-exhaustive enums solves the UB problem, but there is still a usability issue with C APIs that use flags requiring @enumToInt
and @intToEnum
. Defining bitwise operations on non-exhaustive enums of the same type might help but that's not really in line with the intended semantics of non-exhaustive enums. I think I'm still looking for a slightly different use case to make dealing with imported C APIs less painful.
I think the best way extern enum
can work is @SpexGuy's second option with integer operations and exhaustive enums.
Pros:
- Type safety of Zig enums. If you want to pass an
extern enum
to an integer you woukd still need to use@EnumToInt
. - Easier interop with c code. Integer operations between same type
extern enum
are allowed. translate-c
can be simpler.
Cons:
- Someone might abuse it somehow?
I have enums implemented in translate-c-2
in a way that would benefit from this proposal getting accepted.
I think the best way
extern enum
can work is ...
Implemented here Vexu@7353e17 for reference.
Here's what's accepted:
extern enum
to allow multiple fields with the same value.-
Expand the definition of extern enum (or make a new c_enum) to match the C definition. This would mean that extern enums are allowed to take on values that are not named, and switches on them must always specify an else case.
What I'm not convinced of:
- Allowing integer operations on extern enums.
Note that extern enums do not necessarily mean "compatible with the semantics of C" - they mean "compatible with the ABI of C". Enums are not a valid way to represent flags. An enum is a scalar value; flags is a set of booleans. As far as what this means for translate-c, I think the @enumToInt
/ @intToEnum
conversions will be fine. The messiness is a good signal that this code could be better if it were converted to zig types properly, e.g. a struct with align(0) bools. One thing to note is that @intToEnum
for extern enums would have no safety check, since an extern enum would allow values that are not named.
"Beating C at its own game" -> as long as the .h file does proper C types. Using enums for flags even in C is an abuse of the type system, and so I'm happy with that being awkward to deal with. The awkwardness is correct.
I re-opened this to push back on this decision. Here is my proposal:
- Delete packed and extern enums from the language. Supplying an integer tag type guarantees the memory layout of the enum.
- translate-c: switch to translating C enums into integers and constants. This matches how C treats enums, and solves a bunch of awkwardness trying to use translated C enum code in zig. Remember that Zig exists first and foremost as its own language, not to be a servant to C code.
Reasoning: nearly all the instances of extern enum
in the standard library today are incorrect. It is not obvious that one should choose a regular enum with an explicit tag type over an extern enum
.
There are real consequences for field aliases; for example trying to use std.enums.directEnumArray
on an extern enum tries to run O(N^2)
logic at compile time due to the possible existence of field aliases.
Enums with field aliases are a different mathematical entity, one that is more complicated to reason about, one that requires costly compromises in the compiler implementation, and one that should not be in Zig.
People mentioned enum flags being used in C. Here's an idiom I've been using when interacting with certain OS data structures whose enums have certain values based on other enum values:
/// Page protection flags
pub const Protection = EnumFromDecl(vm_prot_t, false, struct {
pub const READ: u32 = 0x01;
pub const WRITE: u32 = 0x02;
pub const READ_WRITE: u32 = READ | WRITE;
...
});
This mirrors the C definition/documentation nicely, while allowing use-sites to switch on the enum:
switch (seg64.initprot) {
...
.READ_WRITE => std.debug.print("Read-write case\n", .{}),
...
}
EnumFromDecl
just walks through the declarations and produce enum fields, basically the inverse to EnumFieldStruct
which already exists in the std library.
The bool parameter determines if the resulting enum should be exhaustive or not.
@cryptocode, enum flags should be represented using packed struct
.
pub const Protection = packed struct(u32) {
read: bool, // this is the LSB
write: bool, // and the next bit
execute: bool, // etc
_: u29 = 0, // remaining (padding) bits
};
@mlugg yeah most of time either that or just constants. But sometimes you really do want an enum, it just so happens that some fields are based on others. That is, on use-sites you wanna switch on the named combinations rather than inspecting individual flags. Narrow usecase, but when it fits it's really nice as you get the benefits of using switch (exhaustiveness being one)
Not proposing any changes to language or std, just sharing an idiom that's been useful, say, when all named flag combinations must be handled.