Zig Union(Enum) - WTF is switch(union(enum))

The power and complexity of Union(Enum) in Zig


Ed Yu (@edyu on Github and @edyu on Twitter) Jun.13.2023


Zig Logo

Introduction

Zig is a modern system programming language and although it claims to a be a better C, many people who initially didn't need system programming were attracted to it due to the simplicity of its syntax compared to alternatives such as C++ or Rust.

However, due to the power of the language, some of the syntax are not obvious for those first coming into the language. I was actually one such person.

One of my favorite languages is Haskell and if you ever thought that you prefer a typed language you owe it to yourself to learn Haskell at least once so you can appreciate how many other languages "borrowed" their type systems from it. I can promise you that you'll come out a better programmer.

ADT

One of the most widely used features and the underlying foundation of the Haskell type system is the ADT or Algebraic Data Types (not to be confused with Abstract Data Types). You can look up the difference on StackOverflow.

However, for us programmers, you can just think of Abstract Data Types as either a struct or a simple class (simple as in not nested).

For ADT or Algebraic Data Types, we need to have access to union for those that has experienced it before in languages that provide such construct such as C or in our case Zig.

Note: In order for ADT to be called Algebraic, it needs to support both sum and product. Sum means that the type needs to support A or B but not both together, whereas product means the type needs to support A and B together.

Why do we care?

The main reason for ADT to exist is so that you can express the concept of a type that can be in multiple states or forms. In other words, you can say that an object of that type can be either this or that, or something else.

For example, for a typical tree structure, you can say a tree node is either a leaf or a node that contains either other nodes or a leaf.

Another example would be that for a linked list, you can say that the list is formed recursively by a node that points to either another node or by the end of the list.

However, to show how we can use ADT in Zig, we have to explain some other concepts first.

Zig Struct

The foundation of data types in Zig is the struct. In fact, it's pretty much everywhere in Zig.

The struct in Zig is probably the closest thing to a class in most object-oriented programming languages.

Here is the basic idea:

// if you want to try yourself, you must import `std`
const std = @import("std");

// let's construct a binary tree node
const BinaryTree = struct {
    // a binary tree has a left subtree and a right subtree
    left: ?*BinaryTree,
    // for simplicity, let's just say we have an unsigned 32-bit integer value
    value: u32,
    right: ?*BinaryTree,
};

const tree = BinaryTree{ .left = null, .value = 42, .right = null };

There are several things of note here in the code above:

  1. If you are not familiar with ?, you are welcome to look over Zig If - WTF. It basically means that the variable can either have a value of the type after ? or if it doesn't then it will take on a value of null.

  2. We are referring to the BinaryTree type inside the BinaryTree type definition as a tree is a recursive structure.

  3. However, you must use the * to denote that left and right are pointers to another BinaryTree struct. If you leave out the pointer then the compiler will complain because then the size of BinaryTree is dynamic as it can grow to be arbitrarily big as we add more sub-trees.

The following code will show a slightly more complex tree structure. Note that we have to use & in order to get the pointer of the BinaryTree struct.

    var left = BinaryTree{ .left = null, .value = 21, .right = null };
    var far_right = BinaryTree{ .left = null, .value = 168, .right = null };
    var right = BinaryTree{ .left = null, .value = 84, .right = &far_right };

    const tree2 = BinaryTree{ .left = &left, .value = 42, .right = &right };

Zig Enum

Sometimes, a struct is an overkill if you just want to have a set of possible values for a variable to take and restrict the variable to take only a value from the set. Usually, we would use enum for such a use case.

// sorry if I left our your favorite pet
const Pet = enum { Dog, Cat, Fish, Iguana, Platypus };

const fav: Pet = .Cat;

// Each of the value of an enum is called a tag
std.debug.print("Ed's favorite pet is {s}.\n", .{@tagName(Pet.Cat)});

// you can specify what type and what value the enum takes
const Binary = enum(u1) { Zero = 0, One = 1 };

std.debug.print("There are {d}{d} types of people in this world, those understand binary and those who don't.\n", .{
    @enumToInt(Binary.One),
    @enumToInt(Binary.Zero)
});

Switch on Enum

One of the most convenient constructs for an enum is the switch expression. In Haskell, the reason ADT is so useful is the ability to pattern match on the switch expression. In fact, Haskell, function definition is basically a super-charged switch statement.

So how do we use switch statement in Zig?

const fav: Pet = .Cat;

std.debug.print("{s} is ", .{@tagName(fav)});
switch (fav) {
    .Dog => std.debug.print("needy!\n", .{}),
    .Cat => std.debug.print("perfect!\n", .{}),
    .Fish => std.debug.print("so much work!\n", .{}),
    .Iguana => std.debug.print("not tasty!\n", .{}),
    else => std.debug.print("legal?\n", .{}),
}

const score = switch (fav) {
    .Dog => 50,
    .Cat => 100,
    .Fish => 25,
    .Iguana => 15,
    else => 75,
};

Union

In C and in Zig, union is similar to struct, except that instead of the structure having all the fields, only one of the fields of the union is active. For those familiar with C union, please be aware that Zig union cannot be used to reinterpret memory. So in other words, you cannot use one field of the union to cast the value defined by another field type.

const Value = union {
    int: i32,
    float: f64,
    string: []const u8,
};

var value = Value{ .int = 42 };
// you can't do this
var fval = value.float;
std.debug.print("{d}\n", .{fval});

// you can't do this, either
var bval = value.string;
std.debug.print("{c}\n", .{bval[0]});

Switch on Union

Well, you cannot use switch on union; at least not on simple union.

// won't compile
switch (value) {
    .int => std.debug.print("value is int={d}\n", .{value.int}),
    .float => std.debug.print("value is float={d}\n", .{value.float}),
    .string => std.debug.print("value is string={s}!\n", .{value.string}),
}

Union(Enum) is Tagged Union

The error message on the previous example will actual say: note: consider 'union(enum)' here.

The Zig nomenclature for union(enum) is actually called tagged union. As we mentioned earlier, the individual fields of an enum are called tags.

Tagged union was created so that they can be used in switch expressions.

// first define the tags
const ValueType = enum {
    int,
    float,
    string,
    unknown,
};

// not too different from simple union
const Value = union(ValueType) {
    int: i32,
    float: f64,
    string: []const u8,
    unknown: void,
};

// just like the simple union
var value = Value{ .float = 42.21 };

switch (value) {
    .int => std.debug.print("value is int={d}\n", .{value.int}),
    .float => std.debug.print("value is float={d}\n", .{value.float}),
    .string => std.debug.print("value is string={s}\n", .{value.string}),
    else => std.debug.print("value is unknown!\n", .{}),
}

Capture Tagged Union Value

You can use the capture in the switch expression if you need to access the value.

switch (value) {
    .int => |v| std.debug.print("value is int={d}\n", .{v}),
    .float => |v| std.debug.print("value is float={d}\n", .{v}),
    .string => |v| std.debug.print("value is string={s}\n", .{v}),
    else => std.debug.print("value is unknown!\n", .{}),
}

Modify Tagged Union

If you need to modify the value, you have to use convert the value to a pointer in the capture using *.

switch (value) {
    .int => |*v| v.* += 1,
    .float => |*v| v.* ^= 2,
    .string => |*v| v.* = "I'm not Ed",
    else => std.debug.print("value is unknown!\n", .{}),
}

Tagged Union as ADT

We now have everything we need to implement Zig version of ADT. What makes ADT useful is that not only it will tell you the state but also the context of the state.

Using Zig for instance, the active tag in a union will tell you the state, and if the tag is a type that has a value, then the value is the context.

// this example is fairly involved, please see the full code on github
// You can find the code at https://github.com/edyu/wtf-zig-adt/blob/master/testadt.zig
const NodeType = enum {
    tip,
    node,
};

const Tip = struct {};

const Node = struct {
    left: *const Tree,
    value: u32,
    right: *const Tree,
};

const Tree = union(NodeType) {
    tip: Tip,
    node: *const Node,
}

const leaf = Tip{};

// this is meant to reimplement the binary tree example on https://wiki.haskell.org/Algebraic_data_type
// if you call tree.toString(), it will print out:
// Node (Node (Node (Tip 1 Tip) 3 Node (Tip 4 Tip)) 5 Node (Tip 7 Tip))
const tree = Tree{ .node = &Node{
    .left = &Tree{ .node = &Node{
        .left = &Tree{ .node = &Node{
            .left = &Tree{ .tip = leaf },
            .value = 1,
            .right = &Tree{ .tip = leaf } } },
        .value = 3,
        .right = &Tree{ .node = &Node{
            .left = &Tree{ .tip = leaf },
            .value = 4,
            .right = &Tree{ .tip = leaf } } } } },
    .value = 5,
    .right = &Tree{ .node = &Node{
        .left = &Tree{ .tip = leaf },
        .value = 7,
        .right = &Tree{ .tip = leaf } } } } };
// see the full example on github

Bonus

In Zig, there is also something called non-exhaustive enum.

Non-exhaustive enum must be defined with an integer tag type in the (). You then put _ as the last tag in the enum definition.

Instead of else, you can use _ to ensure you handled all the cases in the switch expression.

const Eds = enum(u8) {
    Ed,
    Edward,
    Edmond,
    Eduardo,
    Edwin,
    Eddy,
    Eddie,
    _,
};

const ed = Eds.Ed;

std.debug.print("All your code are belong to ", .{});
switch (ed) {
    // Zig switch uses , not | for multiple options
    .Ed, .Edward => std.debug.print("{s}!\n", .{@tagName(ed)}),
    // can use capture
    .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => |name| std.debug.print("this {s}!\n", .{@tagName(name)}),
    // else works but look at the code below for _ vs else
    else => std.debug.print("us\n", .{}),
}

// obviously no such enum predefined
const not_ed = @intToEnum(Eds, 241);
std.debug.print("All your base are belong to ", .{});
switch (not_ed) {
    .Ed, .Edward => std.debug.print("{s}!\n", .{@tagName(ed)}),
    .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => |name| std.debug.print("this {s}!\n", .{@tagName(name)}),
    // _ will force you to handle all defined cases
    // if any of the previous .Ed, .Edward ... .Eddie is missing, this won't compile
    // for example, if you forgot .Edurdo
    // and wrote: .Edmond, .Eduardo, .Edwin, .Eddy, .Eddie => ...
    // the code won't compile
    _ => std.debug.print("us\n", .{}),
}

Btw, you can add functions to enum, union, union(enum) just like you can in struct. You can see examples of that in the code below.

The End

You can find the code here.

Zig Logo