proposal: spec: add sum types / discriminated unions

Question

proposal: spec: add sum types / discriminated unions

DemiMarie opened this issue 9 years ago · 429 comments

This is a proposal for sum types, also known as discriminated unions. Sum types in Go should essentially act like interfaces, except that:

they are value types, like structs
the types contained in them are fixed at compile-time

Sum types can be matched with a switch statement. The compiler checks that all variants are matched. Inside the arms of the switch statement, the value can be used as if it is of the variant that was matched.

Answer 1 · 2017-03-06T04:36:48.000Z

This has been discussed several times in the past, starting from before the open source release. The past consensus has been that sum types do not add very much to interface types. Once you sort it all out, what you get in the end is an interface type where the compiler checks that you've filled in all the cases of a type switch. That's a fairly small benefit for a new language change.

If you want to push this proposal along further, you will need to write a more complete proposal doc, including: What is the syntax? Precisely how do they work? (You say they are "value types", but interface types are also value types). What are the trade-offs?

Answer 2 · 2017-03-06T20:53:46.000Z

See https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d03t6ji/?st=ixp2gf04&sh=7d6920db for some past discussion to be aware of.

Answer 3 · 2017-03-06T22:44:41.000Z

I think this is too significant a change of the type system for Go1 and there's no pressing need.
I suggest we revisit this in the larger context of Go 2.

Answer 4 · 2017-03-22T17:56:04.000Z

Thanks for creating this proposal. I've been toying with this idea for a year or so now.
The following is as far as I've got with a concrete proposal. I think
"choice type" might actually be a better name than "sum type", but YMMV.

Sum types in Go

A sum type is represented by two or more types combined with the "|"
operator.

type: type1 | type2 ...

Values of the resulting type can only hold one of the specified types. The
type is treated as an interface type - its dynamic type is that of the
value that's assigned to it.

As a special case, "nil" can be used to indicate whether the value can
become nil.

For example:

type maybeInt nil | int

The method set of the sum type holds the intersection of the method set
of all its component types, excluding any methods that have the same
name but different signatures.

Like any other interface type, sum type may be the subject of a dynamic
type conversion. In type switches, the first arm of the switch that
matches the stored type will be chosen.

The zero value of a sum type is the zero value of the first type in
the sum.

When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen.

For example:

var x int|float64 = 13

would result in a value with dynamic type int, but

var x int|float64 = 3.13

would result in a value with dynamic type float64.

Implementation

A naive implementation could implement sum types exactly as interface
values. A more sophisticated approach could use a representation
appropriate to the set of possible values.

For example a sum type consisting only of concrete types without pointers
could be implemented with a non-pointer type, using an extra value to
remember the actual type.

For sum-of-struct-types, it might even be possible to use spare padding
bytes common to the structs for that purpose.

Answer 5 · 2017-03-22T18:20:02.000Z

@rogpeppe How would that interact with type assertions and type switches? Presumably it would be a compile-time error to have a case on a type (or assertion to a type) that is not a member of the sum. Would it also be an error to have a nonexhaustive switch on such a type?

Answer 6 · 2017-03-22T18:30:00.000Z

For type switches, if you have

type T int | interface{}

and you do:

switch t := t.(type) {
  case int:
    // ...

and t contains an interface{} containing an int, does it match the first case? What if the first case is case interface{}?

Or can sum types contain only concrete types?

What about type T interface{} | nil? If you write

var t T = nil

what is t's type? Or is that construction forbidden? A similar question arises for type T []int | nil, so it's not just about interfaces.

Answer 7 · 2017-03-22T18:31:18.000Z

Yes, I think it would be reasonable to have a compile-time error
to have a case that can't be matched. Not sure about whether it's
a good idea to allow non-exhaustive switches on such a type - we
don't require exhaustiveness anywhere else. One thing that might
be good though: if the switch is exhaustive, we could not require a default
to make it a terminating statement.

That means that you can get the compiler to error if you have:

func addOne(x int|float64) int|float64 {
    switch x := x.(type) {
    case int:
        return x + 1
    case float64:
         return x + 1
    }
}

and you change the sum type to add an extra case.

Answer 8 · 2017-03-22T18:39:04.000Z

For type switches, if you have

type T int | interface{}

and you do:

switch t := t.(type) {
case int:
// ...
and t contains an interface{} containing an int, does it match the first case? What if the first case is case interface{}?

t can't contain an interface{} containing an int. t is an interface
type just like any other interface type, except that it can only
contain the enumerated set of types that it consists of.
Just like an interface{} can't contain an interface{} containing an int.

Sum types can match interface types, but they still just get a concrete
type for the dynamic value. For example, it would be fine to have:

type R io.Reader | io.ReadCloser

What about type T interface{} | nil? If you write

var t T = nil

what is t's type? Or is that construction forbidden? A similar question arises for type T []int | nil, so it's not just about interfaces.

According to the proposal above, you get the first item
in the sum that the value can be assigned to, so
you'd get the nil interface.

In fact interface{} | nil is technically redundant, because any interface{}
can be nil.

For []int | nil, a nil []int is not the same as a nil interface, so the
concrete value of ([]int|nil)(nil) would be []int(nil) not untyped nil.

Answer 9 · 2017-03-22T18:44:27.000Z

The []int | nil case is interesting. I would expect the nil in the type declaration to always mean "the nil interface value", in which case

type T []int | nil
var x T = nil

would imply that x is the nil interface, not the nil []int.

That value would be distinct from the nil []int encoded in the same type:

var y T = []int(nil)  // y != x

Answer 10 · 2017-03-22T18:59:30.000Z

Wouldn't nil always be required even if the sum is all value types? Otherwise what would var x int64 | float64 be? My first thought, extrapolating from the other rules, would be the zero value of the first type, but then what about var x interface{} | int? It would, as @bcmills points out, have to be a distinct sum nil.

It seems overly subtle.

Exhaustive type switches would be nice. You could always add an empty default: when it's not the desired behavior.

Answer 11 · 2017-03-22T20:10:30.000Z

The proposal says "When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen."

So, with:

type T []int | nil
var x T = nil

x would have concrete type []int because nil is assignable to []int and []int is the first element of the type. It would be equal to any other []int (nil) value.

Wouldn't nil always be required even if the sum is all value types? Otherwise what would var x int64 | float64 be?

The proposal says "The zero value of a sum type is the zero value of the first type in
the sum.", so the answer is int64(0).

My first thought, extrapolating from the other rules, would be the zero value of the first type, but then what about var x interface{} | int? It would, as @bcmills points out, have to be a distinct sum nil

No, it would just be the usual interface nil value in that case. That type (interface{} | nil) is redundant. Perhaps it might be a good idea to make it a compiler to specify sum types where one element is a superset of another, as I can't currently see any point in defining such a type.

Answer 12 · 2017-03-22T20:16:32.000Z

The zero value of a sum type is the zero value of the first type in the sum.

That is an interesting suggestion, but since the sum type must record somewhere the type of the value that it currently holds, I believe it means that the zero value of the sum type is not all-bytes-zero, which would make it different from every other type in Go. Or perhaps we could add an exception saying that if the type information is not present, then the value is the zero value of the first type listed, but then I'm not sure how to represent nil if it is not the first type listed.

Answer 13 · 2017-03-22T20:48:40.000Z

So (stuff) | nil only makes sense when nothing in (stuff) can be nil and nil | (stuff) means something different depending on whether anything in stuff can be nil? What value does nil add?

@ianlancetaylor I believe many functional languages implement (closed) sum types essentially like how you would in C

struct {
    int which;
    union {
         A a;
         B b;
         C c;
    } summands;
}

if which indexes into the union's fields in order, 0 = a, 1 = b, 2 = c, the zero value definition works out to all bytes are zero. And you'd need to store the types elsewhere, unlike with interfaces. You'd also need special handling for the nil tag of some kind wherever you store the type info.

That would make union's value types instead of special interfaces, which is also interesting.

Answer 14 · 2017-03-22T20:50:43.000Z

Is there a way to make the all zero value work if the field which records the type has a zero value representing the first type? I'm assuming that one possible way for this to be represented would be:

type A = B|C
struct A {
  choice byte // value 0 or 1
  value ?// (thing big enough to store B | C)
}

[edit]

Sorry @jimmyfrasche beat me to the punch.

Answer 15 · 2017-03-22T21:15:40.000Z

Is there anything added by nil that couldn't be done with

type S int | string | struct{}
var None struct{}

?

That seems like it avoids a lot of the confusion (that I have, at least)

Answer 16 · 2017-03-22T21:19:21.000Z

Or better

type (
     None struct{}
     S int | string | None
)

that way you could type switch on None and assign with None{}

Answer 17 · 2017-03-22T21:20:47.000Z

@jimmyfrasche struct{} is not equal to nil. It's a minor detail, but it would make type-switches on sums needlessly(?) diverge from type-switches on other types.

Answer 18 · 2017-03-22T21:31:27.000Z

@bcmills It wasn't my intent to claim otherwise—I meant that it could be used for the same purpose as differentiating a lack of value without overlapping with the meaning of nil in any of the types in the sum.

Answer 19 · 2017-03-22T21:59:01.000Z

@rogpeppe what does this print?

// r is an io.Reader interface value holding a type that also implements io.Closer
var v io.ReadCloser | io.Reader = r
switch v.(type) {
case io.ReadCloser: fmt.Println("ReadCloser")
case io.Reader: fmt.Println("Reader")
}

I would assume "Reader"

Answer 20 · 2017-03-22T22:07:06.000Z

@jimmyfrasche I would assume ReadCloser, same as you'd get from a type-switch on any other interface.

(And I would also expect sums which include only interface types to use no more space than a regular interface, although I suppose that an explicit tag could save a bit of lookup overhead in the type-switch.)

Answer 21 · 2017-03-22T22:20:06.000Z

@bcmills it's the assigment that's interesting, consider: https://play.golang.org/p/PzmWCYex6R

Answer 22 · 2017-03-22T22:52:57.000Z

@ianlancetaylor That's an excellent point to raise, thanks. I don't think it's hard to get around though, although it does imply that my "naive implementation" suggestion is itself too naive. A sum type, although treated as an interface type, does not have to actually contain direct pointer to the type and its method set - instead it could, when appropriate, contain an integer tag that implies the type. That tag could be non-zero even when the type itself is nil.

Given:

 var x int | nil = nil

the runtime value of x need not be all zeros. When switching on the type of x or converting
it to another interface type, the tag could be indirected through a small table containing
the actual type pointers.

Another possibility would be to allow a nil type only if it's the first element, but
that precludes constructions like:

var t nil | int
var u float64 | t

Answer 23 · 2017-03-22T23:01:04.000Z

@jimmyfrasche I would assume ReadCloser, same as you'd get from a type-switch on any other interface.

Yes.

@bcmills it's the assigment that's interesting, consider: https://play.golang.org/p/PzmWCYex6R

I don't get this. Why would "this [...] have to be valid for the type switch to print ReadCloser"
Like any interface type, a sum type would store no more than the concrete value of what's in it.

When there are several interface types in a sum, the runtime representation is just an interface value - it's just that we know that the underlying value must implement one or more of the declared possibilities.

That is, when you assign something to a type (I1 | I2) where both I1 and I2 are interface types, it's not possible to tell later whether the value you put into was known to implement I1 or I2 at the time.

Answer 24 · 2017-03-22T23:33:12.000Z

If you have a type that's io.ReadCloser | io.Reader you can't be sure when you type switch or assert on io.Reader that it's not an io.ReadCloser unless assignment to a sum type unboxes and reboxes the interface.

Answer 25 · 2017-03-22T23:43:39.000Z

Going the other way, if you had io.Reader | io.ReadCloser it would either never accept an io.ReadCloser because it goes strictly right-to-left or the implementation would have to search for the "best matching" interface from all interfaces in the sum but that cannot be well defined.

Answer 26 · 2017-03-23T03:34:05.000Z

@rogpeppe In your proposal, ignoring optimization possibilities in the implementation and subtleties of zero values, the main benefit of using a sum type over a manually crafted interface type (containing the intersection of the relevant methods) is that the type checker can point out errors at compile time rather than runtime. A 2nd benefit is that a type's value is more discriminated and thus may help with readability/understanding of a program. Is there any other major benefit?

(I am not trying to diminish the proposal in any way, just trying to get my intuition right. Especially if the extra syntactic and semantic complexity is "reasonably small" - whatever that may mean - I can definitively see the benefit of having the compiler catch errors early.)

Answer 27 · 2017-03-23T07:30:02.000Z

@griesemer Yes, that's about right.

Particularly when communicating messages over channels or the network, I think it helps readability and correctness to be able to have a type that expresses exactly the available possibilities. It's common currently to make a half-hearted attempt to do this by including an unexported method in an interface type, but this is a) circumventable by embedding and b) it's hard to see all the possible types because the unexported method is hidden.

Answer 28 · 2017-03-23T07:40:51.000Z

@jimmyfrasche

If you have a type that's io.ReadCloser | io.Reader you can't be sure when you type switch or assert on io.Reader that it's not an io.ReadCloser unless assignment to a sum type unboxes and reboxes the interface.

It you have that type, you know that it's always an io.Reader (or nil, because any io.Reader can also be nil). The two alternatives aren't exclusive - the sum type as proposed is an "inclusive or" not an "exclusive or".

Going the other way, if you had io.Reader | io.ReadCloser it would either never accept an io.ReadCloser because it goes strictly right-to-left or the implementation would have to search for the "best matching" interface from all interfaces in the sum but that cannot be well defined.

If by "going the other way", you mean assigning to that type, the proposal says:

"When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen."

In this case, a io.ReadCloser can fit into both an io.Reader and an io.ReadCloser, so it chooses io.Reader, but there's actually no way to tell afterwards. There is no detectable difference between the type io.Reader and the type io.Reader | io.ReadCloser, because io.Reader can also hold all interface types that implement io.Reader. That's why I suspect it might be a good idea to make the compiler reject types like this. For example, it could reject any sum type involving interface{} because interface{} can already contain any type, so the extra qualifications don't add any information.

Answer 29 · 2017-03-23T20:11:51.000Z

@rogpeppe there are a lot of things I like about your proposal. The left to right assignment semantics and the zero value is the zero value of the leftmost type rules are very clear and simple. Very Go.

What I'm worried about is assigning a value that's already boxed in an interface to a sum typed variable.

Let's, for the moment, use my previous example and say that RC is a struct that can be assigned to an io.ReadCloser.

If you do this

var v io.ReadCloser | io.Reader = RC{}

the results are obvious and clear.

However, if you do this

var r io.Reader = RC{}
var v io.ReadCloser | io.Reader = r

the only sensible thing to do is have v store r as an io.Reader, but that means when you type switch on v you can't be sure that when you hit the io.Reader case that you don't in fact have an io.ReadCloser. You'd need to have something like this:

switch v := v.(type) {
case io.ReadCloser: useReadCloser(v)
case io.Reader:
  if rc, ok := v.(io.ReadCloser); ok {
    useReadCloser(rc)
  } else {
    useReader(v)
  }
}

Now, there's a sense in which io.ReadCloser <: io.Reader, and you could just disallow those, as you suggested, but I think the problem is more fundamental and may apply to any sum type proposal for Go†.

Let's say you have three interfaces A, B, and C, with the methods A(), B(), and C() respectively, and a struct ABC with all three methods. A, B, and C are disjoint so A | B | C and its permutations are all valid types. But you still have cases like

var c C = ABC{}
var v A | B | C = c

There a bunch of ways to rearrange that and you still get no meaningful guarantees about what v is when interfaces are involved. After you unbox the sum you need to unbox the interface if order is important.

Maybe the restriction should be that none of the summands can be interfaces at all?

The only other solution I can think of is to disallow assigning an interface to a sum typed variable, but that seems in its own way more severe.

† that doesn't involve type constructors for the types in the sum to disambiguate (like in Haskell where you have to say Just v to construct a value of type Maybe)—but I am not in favor of that at all.

Answer 30 · 2017-03-23T20:21:53.000Z

@jimmyfrasche Is the use-case for ordered unboxing actually important? That's not obvious to me, and for the cases where it is important it's easy to work around with explicit box structs:

type ReadCloser struct {  io.ReadCloser }
type Reader struct { io.Reader }

var v ReadCloser | Reader = Reader{r}

Answer 31 · 2017-03-23T20:39:31.000Z

@bcmills It's more that the results are not obvious and fiddly and means that all the guarantees you want with a sum type evaporate when interfaces are involved. I can see it causing all kinds of subtle bugs and misunderstanding.

The explicit box structs example you provide shows that disallowing interfaces in sum types doesn't limit the power of sum types at all. It's effectively creating the type constructors for disambiguation that I mentioned in the footnote. Admittedly it's slightly annoying and an extra step, but it's simple and feels very much in line with Go's philosophy of letting language constructs be as orthogonal as possible.

Answer 32 · 2017-03-24T09:22:44.000Z

all the guarantees you want with a sum type

It depends what guarantees you expect. I think you're expecting a sum type to be
a strictly tagged value, so given any types A|B|C, you know exactly what static
type you assigned to it. I see it as a type restriction on a single value of concrete
type - the restriction is that the value is type-compatible with (at least) one of A, B and C.
In the end it's just an interface with a value in.

That is, if a value can be assigned to a sum type by virtue of it being assignment-compatible
with one of the sum type's members, we don't record which of those members has been
"chosen" - we just record the value itself. The same as when you assign an io.Reader
to an interface{}, you lose the static io.Reader type and just have the value itself
which is compatible with io.Reader but also with any other interface type that it happens
to implement.

In your example:

var c C = ABC{}
var v A | B | C = c

A type assertion of v to any of A, B and C would succeed. That seems reasonable to me.

Answer 33 · 2017-03-24T20:19:58.000Z

@rogpeppe those semantics make more sense than what I was imagining. I'm still not entirely convinced that interfaces and sums mix well, but I'm no longer certain they don't. Progress!

Let's say you have type U I | *T where I is an interface type and *T is a type that implements I.

Given

var i I = new(T)
var u U = i

the dynamic type of u is *T, and in

var u U = new(T)

you can access that *T as an I with a type assertion. Is that correct?

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

It would also be somewhat different from something like var v uint8 | int32 | int64 = i which would, I imagine, just always go with whichever of those three types i is even if i was an int64 that could fit in a uint8.

Answer 34 · 2017-03-24T20:40:56.000Z

Progress!

Yay!

you can access that *T as an I with a type assertion. Is that correct?

Yes.

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

Yup, as the proposal says (of course the compiler knows statically which one to choose so there's no searching at runtime).

It would also be somewhat different from something like var v uint8 | int32 | int64 = i which would, I imagine, just always go with whichever of those three types i is even if i was an int64 that could fit in a uint8.

Yes, because unless i is a constant, it will only be assignable to one of those alternatives.

Answer 35 · 2017-03-24T20:43:31.000Z

Yes, because unless i is a constant, it will only be assignable to one of those alternatives.

That's not quite true, I realise, because of the rule allowing assignment of unnamed types to named types. I don't think that makes too much difference though. The rule remains the same.

Answer 36 · 2017-03-24T20:53:27.000Z

So the I | *T type from my last post is effectively the same as the type I and io.ReadCloser | io.Reader is effectively the same type as io.Reader?

Answer 37 · 2017-03-24T21:16:32.000Z

That's right. Both types would be covered by my suggested rule that the compiler reject sum types where one type is an interface that is implemented by another of the types. The same or similar rule could cover sum types with duplicate types like int|int.

One thought: it is perhaps unintuitive that int|byte isn't the same as byte|int, but it's probably ok in practice.

Answer 38 · 2017-03-24T21:18:03.000Z

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

Yup, as the proposal says (of course the compiler knows statically which one to choose so there's no searching at runtime).

I'm not following this. The way I read it (which could be different from what was intended) there's at least two ways to deal with a union U of I and T-implements-I.

1a) at assignment of U u = t, the tag is set to T. Later selection results in a T because the tag is a T.
1b) at assignment of U u = i (i is really a T), the tag is set to I. Later selection results in a T because the tag is a I but a second check (performed because T implements I and T is a member of U) discovers a T.

2a) like 1a
2b) at assignment of U u = i (i is really a T), generated code checks the value (i) to see if it is actually a T, because T implements I and T is also a member of U. Because it is, the tag is set to T. Later selection directly results in a T.

In the case that T, V, W all implement I and U = *T | *V | *W | I, assignment U u = i requires (up to) 3 type tests.

Interfaces and pointers was not the original use case for union types, though, was it?

I can imagine certain sorts of hackery where a "nice" implementation would perform some bit banging -- for example, if you have a union of 4 or fewer pointer types where all referents are 4-byte aligned, store the tag in the lower 2 bits of the value. This in turn implies that it's not good to take the address of a member of a union (it wouldn't be anyhow, since that address could be used to re-store an "old" type without adjusting the tag).

Or if we had a 50-ish-bit address space and were willing to take some liberties with NaNs, we could slap integers, pointers, and doubles all into a 64-bit union, and the possible cost of some bit fiddling.

Both sub-suggestions are gross, I am certain that both would have a small (?) number of fanatical proponents.

Answer 39 · 2017-03-24T21:27:10.000Z

This in turn implies that it's not good to take the address of a member of a union

Correct. But I don't think the result of a type assertion is addressable today anyway, is it?

Answer 40 · 2017-03-24T22:02:46.000Z

at assignment of U u = i (i is really a T), the tag is set to I.

I think this is the crux - there is no tag I.

Ignore the runtime representation for a moment and consider a sum type as an interface. As with any interface, it has a dynamic type (the type that's stored in it). The "tag" you refer to is exactly that dynamic type.

As you suggest (and I tried to imply in the last paragraph of the proposal) there may be ways to store the type tag in more efficient ways than with a pointer to the runtime type, but in the end it is always just encoding the dynamic type of the sum-type value, not which of the alternatives was "chosen" when it was created.

Interfaces and pointers was not the original use case for union types, though, was it?

It was not, but any proposal needs to be as orthogonal as possible with respect to other language features, in my view.

Answer 41 · 2017-03-24T22:04:02.000Z

@dr2chase my understanding so far is that, if a sum type includes any interface types in its definition, then at runtime its implementation is identical to an interface (containing the intersection of method sets) but the compile-time invariants about allowable types are still enforced.

Answer 42 · 2017-03-24T22:07:01.000Z

Even if a sum type only contained concrete types and it was implemented like a C-style discriminated union, you wouldn't be able to address a value in the sum type since that address could become a different type (and size) after you took the address. You could take the address of the sum typed value itself, though.

Answer 43 · 2017-03-24T22:14:40.000Z

Is it desirable that sum types behave this way? We could just as easily declare that the selected/asserted type is the same as what the programmer said/implied when a value was assigned to the union. Otherwise we might get led to interesting places with respect to int8 vs int16 vs int32, etc. Or, e.g., int8 | uint8.

Answer 44 · 2017-03-24T22:58:13.000Z

Is it desirable that sum types behave this way?

That's a matter of judgement. I believe it is, because we already have the concept of interfaces in the language - values with both a static and a dynamic type. The sum types as proposed just provide a more precise way to specify interface types in some cases. It also means that sum types can work without restriction on any other types. If you don't do that, you need to exclude interface types and then the feature isn't fully orthogonal.

Answer 45 · 2017-03-24T23:00:53.000Z

Otherwise we might get led to interesting places with respect to int8 vs int16 vs int32, etc. Or, e.g., int8 | uint8.

What's your concern here?

Answer 46 · 2017-03-25T00:02:47.000Z

You can't use a function type as a map's key type. I'm not saying that that's equivalent, just that there is precedent for types restricting other kinds of types. Still open to allowing interfaces, still not sold.

What kind of programs can you write with a sum type containing interfaces that you can't otherwise?

Answer 47 · 2017-03-25T23:31:17.000Z

Counterproposal.

A union type is a type that lists zero or more types, written

union {
  T0
  T1
  //...
  Tn
}

All of the listed types (T0, T1, ..., Tn) in a union must be different and none can be interface types.

Methods may be declared on a defined (named) union type by the usual rules. No methods are promoted from the listed types.

There is no embedding for union types. Listing one union type in another is the same as listing any other valid type. However, a union cannot list its own type recursively, for the same reason that type S struct { S } is invalid.

Unions can be embedded in structs.

The value of a union type is a dynamic type, limited to one of the listed types, and a value of the dynamic type—said to to be the stored value. Exactly one of the listed types is the dynamic type at all times.

The zero value of the empty union is unique. The zero value of a nonempty union is the zero value of the first type listed in the union.

A value for a union type, U, can be created with U{} for the zero value. If U has one or more types and v is a value of one of the listed types, T, U{v} creates a union value storing v with dynamic type T. If v is of a type not listed in U that can be assigned to more than one of the listed types, an explicit conversion is required to disambiguate.

A value of a union type U can be converted to another union type V as in V(U{}) iff the set of types in U is a subset of the set of types in V. That is, ignoring order, U must have all the same types as V does, and U cannot have types that are not in V but V can have types not in U.

Assignability between union types is defined as convertibility is, as long as at most one of the union types is defined (named).

A value of one of the listed types, T, of a union type U may be assigned to a variable of the union type U. This sets the dynamic type to T and stores the value. Assignment compatible values work as above.

If all of the listed types support the equality operators:

the equality operators can be used on two values of the same union type. Two values of a union type are never equal if their dynamic types differ.
a value of that union may be compared with a value of any of its listed types. If the dynamic type of the union is not the type of the other operand, == is false and != is true regardless of the stored value. Assignment compatible values work as above.
the union may be used as a map key

No other operators are supported on values of a union type.

A type assertion against a union type for one of its listed types holds if the asserted type is the dynamic type.

A type assertion against a union type for an interface type holds if its dynamic type implements that interface. (Notably, if all the listed types implement this interface the assertion always holds).

Type switches must either be exhaustive, including all listed types, or contain a default case.

Type assertions and type switches return a copy of the stored value.

Package reflect would require a way to get the dynamic type and stored value of a reflected union value and a way to get the listed types of a reflected union type.

Notes:

The union{...} syntax was chosen partially to differentiate from the sum type proposal in this thread, primarily to retain the nice properties in the Go grammar, and incidentally to reinforce that this is a discriminated union. As a consequence, this allows somewhat strange unions such as union{} and union{ int }. The first is in many senses equivalent to struct{} (though by definition a different type) so it doesn't add to the language, other than adding another empty type. The second is perhaps more useful. For example, type Id union { int } is very much like type Id struct { int } except that the union version allows direct assignment without having to specify idValue.int allowing for it to seem more like a built in type.

The disambiguating conversion required when dealing with assignment compatible types are a bit harsh but would catch errors if a union is updated to introduce an ambiguity that downstream code is unprepared for.

The lack of embedding is a consequence of allowing methods on unions and requiring exhaustive matching in type switches.

Allowing methods on the union itself rather than taking the valid intersection of methods of the listed types avoid accidentally getting an unwanted methods. Type asserting the stored value to common interfaces allows simple, explicit wrapper methods when promotion is desired. For example, on a union type U all of whose listed types implement fmt.Stringer:

func (u U) String() string {
  return u.(fmt.Stringer).String()
}

Answer 48 · 2017-03-26T22:01:29.000Z

In the linked reddit thread, rsc said:

It would be weird for the zero value of sum { X; Y } to be different from that of sum { Y; X }. That's not how sums usually work.

I've been thinking about this, since it applies to any proposal really.

That's not a bug: it's a feature.

Consider

type (
  Undefined = struct{}
  UndefinedOrInt union { Undefined; int }
)

vs.

type (
  Illegal = struct{}
  IntOrIllegal union { int; Illegal }
)

UndefinedOrInt says by default it's not yet defined, but, when it is, it will be an int value. This is analogous to *int which is how the sum type (1 + int) needs to be represented in Go now and the zero value is also analogous.

IntOrIllegal, on the other hand, says by default it's the int 0, but it may at some point be marked as illegal. This is still analogous to *int but the zero value is more expressive of the intent, like enforcing that it defaults to new(int).

It's kind of like being able to phrase a bool field in a struct in the negative so the zero value is what you want as the default.

Both zero values of the sums are useful and meaningful in their own right and the programmer can choose the most appropriate for the situation.

If the sum were a days of the week enum (each day being a defined struct{}), whichever is listed first is the first day of the week, the same for an iota-style enum.

Also, I'm not aware of any languages with sum types or discriminated/tagged unions that have the concept of a zero value. C would be the closest but the zero value is uninitialized memory—hardly a lead to follow. Java defaults to null, I believe, but that's because everything is a reference. All the other languages I know of have mandatory type constructors for the summands so there isn't really a notion of zero value. Is there such a language? What does it do?

Answer 49 · 2017-03-26T23:40:09.000Z

If the difference from the mathematical concepts of "sum" and "union" is the problem, we can always call them something else (e.g. "variant").

Answer 50 · 2017-03-27T00:07:16.000Z

For names: Union confuses c/c++ purists. Variant is mainly familiar to COBRA and COM programmers where as discriminated union seems to be preferred by the functional languages. Set is a verb and noun. I like the keyword pick. Limbo used pick. It's short and describes the type's intention to pick from a finite set of types.

Answer 51 · 2017-03-27T00:40:21.000Z

The name/syntax is largely irrelevant. Pick would be fine.

Either proposal in this thread fits the set theoretic definition.

The first type being special for the zero value is irrelevant since type theoretic sums commute, so the order is irrelevant (A + B = B + A). My proposal maintains that property, but product types also commute in theory and are considered different in practice by most languages (Go included,) so it's probably not essential.

Answer 52 · 2017-03-27T12:57:32.000Z

@jimmyfrasche

I personally believe that disallowing interfaces as 'pick' members is a very big drawback. First, it would completely defeat one of the great use cases of 'pick' types - having an error be one of the members. Or you want to deal with a pick type that has either an io.Reader or a string, if you don't want to force the user to use a StringReader beforehand. But all in all, an interface is just another type, and I believe there shouldn't be type restrictions for 'pick' members. That being the case, if a pick type has 2 interface members, where one is fully enclosed by the other, that should be a compile-time error, as previously mentioned.

What I do like from your counter proposal is the fact that methods can be defined on the pick type. I don't think that it should provide a cross section of members' methods, since I don't think there would be a lot of cases where any methods would belong to all members (and you have interfaces for that anyway). And an exhaustive switch + default case is a very good idea.

Answer 53 · 2017-03-27T16:47:44.000Z

@rogpeppe @jimmyfrasche Something I don't see in your proposals is why we should do this. There is a clear disadvantage to adding a new kind of type: it's a new concept that everybody who leans Go will have to learn. What is the compensating advantage? In particular, what does the new kind of type give us that we don't get from interface types?

Answer 54 · 2017-03-27T16:56:59.000Z

@ianlancetaylor Robert summarized it well here: #19412 (comment)

Answer 55 · 2017-03-27T18:06:10.000Z

@ianlancetaylor
At the end of the day, it makes code more readable, and that is the prime directive of Go. Consider json.Token, it's currently defined as an interface{}, however the documentation states that it can actually be only one of specific number of types. If, on the other hand, it's written as

type Token Delim | bool | float64 | Number | string | nil

The user will be able to immediately see all the possibilities, and the tooling would be able to create an exhaustive switch automatically. Furthermore, the compiler will prevent you from sticking an unexpected type in there as well.

Answer 56 · 2017-03-27T18:32:02.000Z

At the end of the day, it makes code more readable, and that is the prime directive of Go.

More features means one has to know more to understand the code. For a person of an average-only knowledge of a language its readability is necessarily inversely proportional to the number of [newly added] features.

Answer 57 · 2017-03-27T18:43:47.000Z

@cznic

More features means one has to know more to understand the code.

Not always. If you can substitute "knowing more about the language" for "knowing more about poorly- or inconsistently-documented invariants in the code", that can still be a net win. (That is, global knowledge can displace the need for local knowledge.)

Answer 58 · 2017-03-27T19:15:41.000Z

If better compile-time type checking is indeed the only benefit, then we can get a very similar benefit without changing the language by introducing a comment checked by vet. Something like

//vet:types Delim | bool | float64 | Number | string | nil
type Token interface{}

Now, we don't currently have any kind of vet comments so this is not an entirely serious suggestion. But I'm serious about the basic idea: if the only advantage we get is something that we can do entirely with a static analysis tool, is it really worth adding a complex new concept to the language proper?

Many, perhaps all, of the tests done by cmd/vet could be added to the language, in the sense that they could be checked by the compiler rather than by a separate static analysis tool. But for various reasons we find it useful to separate vet from the compiler. Why does this concept fall on the language side rather than the vet side?

Answer 59 · 2017-03-27T19:16:58.000Z

@ianlancetaylor re checked comments: https://github.com/BurntSushi/go-sumtype

Answer 60 · 2017-03-27T19:19:38.000Z

@ianlancetaylor as far as whether the change is justified, I've been actively ignoring that—or rather pushing it back. Talking about it in the abstract is vague and doesn't help me: it all sounds like "good things are good and bad things are bad" to me. I wanted to get an idea of what the type would actually be—what it's limitations are, what implications it has, what are the pros, what are the cons—so I could see how it would fit with in the language (or not!) and have an idea of how I would/could use it in programs. I think I have a good idea of what sum types would have to mean in Go now, at least from my perspective. I'm not entirely convinced they're worth it (even if I want 'em real bad), but now that I have something solid to analyze with well defined properties that I can reason about. I know that's not really an answer, per se, but it's where I'm at with this, at least.

Answer 61 · 2017-03-27T19:26:57.000Z

If better compile-time type checking is indeed the only benefit, then we can get a very similar benefit without changing the language by introducing a comment checked by vet.

This is still vulnerable to the need-to-learn-new-things criticism. If I have to learn about those magic vet comments to debug/understand/use code, it's a mental tax, no matter whether we assign it to the Go-language budget or the technically-not-the-Go-language budget. If anything, magic comments are more costly because I didn't know I needed to learn them when I thought I learned the language.

Answer 62 · 2017-03-27T19:30:05.000Z

@cznic
I disagree. With your current assumption, you cannot be sure that a person would then understand what a channel is, or even what a function is. Yet these things exist in the language. And a new feature does not automatically mean that it would make the language harder. In this case, I would argue that it would in fact make it easier to understand, because it makes it immediately clear to the reader what a type is supposed to be, as opposed to using a black box interface{} type.

@ianlancetaylor
I personally think this feature has more to do with making code easier to read and reason about. Compile time safety is a very nice feature, but not the main one. Not only would it make a type signature immediately more obvious, but its subsequent usage would also be easier to understand, and easier to write. People would no longer need to resort to panics if they receive a type that they didn't expect - that is the current behavior even in the standard library, thus though would have an easier time thinking about the usage, without being encumbered by the unknown. And I don't think it is a good idea to rely on comments and other tools (even if they are first-party) for this, because a cleaner syntax is more readable than such a comment. And comments are structureless, and much easier to mess up.

Answer 63 · 2017-03-27T19:31:09.000Z

@ianlancetaylor

Why does this concept fall on the language side rather than the vet side?

You could apply that same question to any feature outside the turing-complete core, and arguably we don't want Go to be a "turing tarpit". On the other hand, we do have examples of languages that have shoved significant subsets of the actual language off into a generic "extension" syntax. (For example, "attributes" in Rust, C++, and GNU C.)

The main reason to put features in extensions or attributes instead of in a core language is to preserve syntax compatibility, including compatibility with tools that are not aware of the new feature. (Whether "compatibility with tools" actually works in practice depends strongly on what the feature actually does.)

In the context of Go, it seems like the main reason to put features in vet is to implement changes which would not preserve Go 1 compatibility if applied to the language itself. I don't see that as an issue here.

One reason not to put features in vet is if they need to be propagated during compilation. For example, if I write:

switch x := somepkg.SomeFunc().(type) {
…
}

will I get the proper warnings for types that aren't in the sum, across package boundaries? It's not obvious to me that vet can do that deep a transitive analysis, so perhaps that's a reason it would need to go into the core language.

Answer 64 · 2017-03-27T21:08:04.000Z

@dr2chase In general, of course, you are correct, but are you correct for this specific example? The code is completely comprehensible without knowing what the magic comment means. The magic comment doesn't change what the code does in any way. The error messages from vet should be clear.

@bcmills

Why does this concept fall on the language side rather than the vet side?

You could apply that same question to any feature outside the turing-complete core....

I don't agree. If the feature under discussion affects the compiled code, then there is an automatic argument in favor of it. In this case, the feature apparently does not affect the compiled code.

(And, yes, vet can parse the source of imported packages.)

I'm not trying to claim that my argument about vet is conclusive. But every language change starts from a negative position: a simple language is very very desirable, and a significant new feature like this inevitably makes the language more complex. You need strong arguments in favor of a language change. And from my perspective those strong arguments have not yet appeared. After all, we've thought about this issue for a long time, and it is a FAQ (https://golang.org/doc/faq#variant_types).

Answer 65 · 2017-03-27T21:13:08.000Z

@ianlancetaylor

In this case, the feature apparently does not affect the compiled code.

I think that depends on the specific details? The "zero-value of the sum is the zero-value of the first type" behavior that @jimmyfrasche mentioned above (#19412 (comment)) certainly would.

Answer 66 · 2017-03-27T21:15:57.000Z

@urandom I was writing up a long explanation of why interface and union types didn't mix without explicit type constructors, but then I realized there was a kind of sensible way to do that, so:

Quick and dirty counterproposal to my counterproposal. (Anything not explicitly mentioned is the same as my previous proposal). I'm not sure one proposal is better than the other, but this one allows interfaces and is all around more explicit:

The union has explicit "field names" hereafter called "tag names":

union { //or whatever
  None, invalid struct{} //None is zero value
  Good, Bad int
  Err error //okay because it's explicitly named
}

There is still no embedding. It's always an error to have a type without a tag name.

Union values have a dynamic tag rather than a dynamic type.

Literal value creation: U{v} is only valid if completely unambiguous, otherwise it has to be U{Tag: v}.

Convertibility and assignment compatibility take tag names into account as well.

Assignment to a union is not magic. It always means assigning a compatible union value. To set the stored value, the desired tag name must be explicitly used: v.Good = 1 sets the dynamic tag to Good and the stored value to 1.

Accessing the stored value uses a tag assertion rather than a type assertion:

g := v.[Tag] //may panic
g, ok := v.[Tag] //no panic but could return zero-value, false

v.Tag is an error on the rhs since it's ambiguous.

Tag switches are like type switches, written switch v.[type], except that the cases are the tags of the union.

Type assertions hold with respect to the type of the dynamic tag. Type switches work similarly.

Given values a, b of some union type, a == b if their dynamic tags are the same and the stored value is the same.

Checking whether the stored value is some particular value requires a tag assertion.

If a tag name is unexported it can only be set and accessed in the package that defines the union. This means that a tag switch of a union with mixed exported and unexported tags can never be exhaustive outside the defining package without a default case. If all the tags are unexported it's a black box.

Reflection needs handle to the tag names as well.

e: Clarification for nested unions. Given

type U union {
  A union {
    A1 T1
    A2 T2
  }
  B union {
    B1 T3
    B2 T4
  }
}
var u U

The value of u is the dynamic tag A and the stored value is the anonymous union with dynamic tag A1 and its stored value is the zero value of T1.

u.B.B2 = returnsSomeT3()

is all that is necessary to switch u from the zero value, even though it moves from one of the nested unions to the other as it's all stored in one memory location. But

v := u.[A].[A2]

has two chances to panic since it tag asserts on two union values and the 2 valued version of tag assertion isn't available without splitting across multiple lines. Nested tag switches would be cleaner, in this case.

edit2: Clarification on type asserts.

Given

type U union {
  Exported, unexported int
}
var u U

a type assert like u.(int) is entirely reasonable. Within the defining package, that would always hold. However, if u is outside the defining package u.(int) would panic when the dynamic tag is unexported to avoid leaking implementation details. Similarly for assertions to an interface type.

Answer 67 · 2017-03-27T22:36:16.000Z

@ianlancetaylor Here are a few examples of how this feature would help:

At the heart of some packages (go/ast for example) is one or more large sum types. It's hard to navigate these packages without understanding those types. More confusingly, sometimes a sum type is represented by an interface with methods (e.g. go/ast.Node), other times by the empty interface (e.g. go/ast.Object.Decl).
Compiling the protobuf oneof feature to Go results in an unexported interface type whose only purpose is to make sure assignment to the oneof field is type-safe. That in turn requires generating a type for each branch of the oneof. Type literals for the end product are hard to read and write:
```
&sppb.Mutation{
            Operation: &sppb.Mutation_Delete_{
                Delete: &sppb.Mutation_Delete{
                    Table:  m.table,
                    KeySet: keySetProto,
                },
            },
}
```
Some (though not all) oneofs could be expressed by sum types.
Sometimes a "maybe" type is exactly what one needs. For example, many Google API resource update operations permit a subset of the resource's fields to be changed. One natural way to express this in Go is by a variant of the resource struct with a "maybe" type for each field. For instance, the Google Cloud Storage ObjectAttrs resource looks like
```
type ObjectAttrs struct {
    ContentType string
    ...
}
```
To support partial updates, the package also defines
```
type ObjectAttrsToUpdate struct {
    ContentType optional.String
    ...
}
```
Where optional.String looks like this (godoc):
```
// String is either a string or nil.
type String interface{}
```
This is both hard to explain and type-unsafe, but it turns out to be convenient in practice, because an ObjectAttrsToUpdate literal looks exactly like an ObjectAttrs literal, while encoding presence. I wish we could have written
```
type ObjectAttrsToUpdate struct {
    ContentType string | nil
    ...
}
```
Many functions return (T, error) with xor semantics (T is meaningful iff error is nil). Writing the return type as T | error would clarify the semantics, increase safety, and provide more opportunities for composition. Even if we can't (for compatibility reasons) or don't want to change a function's return value, the sum type is still useful for carrying around that value, like writing it to a channel.

A go vet annotation would admittedly help many of these cases, but not the ones where an anonymous type makes sense. I think if we had sum types we'd see a lot of

chan *Response | error

That type is short enough to write out multiple times.

Answer 68 · 2017-03-28T00:39:42.000Z

@ianlancetaylor this probably isn't a great start, but here's everything you can do with unions that you can already do in Go1, because I figured it was only fair to acknowledge and summarize those arguments:

(Using my latest proposal with tags for the syntax/semantics below. Also assuming the emitted code is basically like the C code I posted much earlier in the thread.)

Sum types overlap with iota, pointers, and interfaces.

iota

These two types are roughly equivalent:

type Stoplight union {
  Green, Yellow, Red struct {}
}

func (s Stoplight) String() string {
  switch s.[type] {
  case Green: return "green" //etc
  }
}

and

type Stoplight int

const (
  Green Stoplight = iota
  Yellow
  Red
)

func (s Stoplight) String() string {
  switch s {
  case Green: return "green" //etc
  }
}

The compiler would likely emit exactly the same code for both.

In the union version, the int is turned into a hidden implementation detail. With the iota version you can ask what Yellow/Red is or set a Stoplight value to -42 is, but not the with the union version—those are all compiler errors and invariants that can be taken into account during optimization. Similarly, you can write a (value) switch that fails to account for Yellow lights but with a tag switch you'd need a default case to make that explicit.

Of course, there are things you can do with iota that you can't do with union types.

pointers

These two types are roughly equivalent

type MaybeInt64 union {
  None struct{}
  Int64 int64
}

and

type MaybeInt64 *int64

The pointer version is more compact. The union version would need an extra bit (which in turn would likely be word sized) to store the dynamic tag, so the size of the value would likely be the same as https://golang.org/pkg/database/sql/#NullInt64

The union version more clearly documents the intent.

Of course, there are things you can do with pointers that you can't do with union types.

interfaces

These two types are roughly equivalent

type AB union {
  A A
  B B
}

and

type AB interface {
  secret()
}
func (A) secret() {}
func (B) secret() {}

The union version can't be circumvented with embedding. A and B don't need methods in common—they could in fact be primitive types or have entirely disjoint method sets, like the json.Token example @urandom posted.

It's really easy to see what you can put in an AB union versus an AB interface: the definition is the documentation (I have had to read the go/ast source multiple times to figure out what something is).

The AB union can never be nil and can be given methods outside the intersection of its constituents (this could be simulated by embedding the interface in a struct but then construction becomes more delicate and error prone).

Of course, there are things you can do with interfaces that you can't do with union types.

Summary

Maybe that overlap is too much overlap.

In each case the primary benefit of the union versions is indeed stricter compile time checking. What you can't do is more important than what you can. For the compiler that translates into stronger invariants it can use to optimize the code. For the programmer that translates into another thing you can let the compiler worry about—it'll just tell you if you're wrong. In the interface version, at the very least, there are important documentation benefits.

Clunky versions of the iota and pointer examples can be constructed using the "interface with an unexported method" strategy. For that matter, though, structs could be simulated with a map[string]interface{} and (nonempty) interfaces with func types and method values. No one would because it's harder and less safe.

All of those features add something to the language but their absence could be worked around (painfully, and under protest).

So I'm assuming the bar isn't to demonstrate a program that cannot even be approximated in Go, but rather to demonstrate a program that is much more easily and cleanly written in Go with unions than without. So what remains to be shown is that.

Answer 69 · 2017-03-28T20:11:10.000Z

@jimmyfrasche

I see no reason why the union type should have named fields. Names are only useful if you want to distinguish between different fields of the same type. However, a union must never have multiple fields of the same type, as that is quite meaningless. Thus, having names is just redundant, and leads to confusion and more typing.

In an essence, your union type should look something like:

union {
    struct{}
    int
    err
}

The types themselves will provide the unique indentifiers that can be used to assign to a union, quite similar to the way embedded types in structs are used as identifiers.

However, in order for explicit assignments to work, one cannot be able to create a union type by specifying an unnamed type as a member, since the syntax would allow such an expression. E.g. v.struct{} = struct{}

Thus, types like raw struct, unions and funcs have to be named beforehand in order to be part of a union, and become assignable. With this in mind, a nested union will not be anything special, as the inner union will just be another member type.

Now, I'm not sure which syntax would be better.

[union|sum|pick|oneof] {
    type1
    package1.type2
    ....
}

The above seems more go-like, but is a bit verbose for such a type.

On the other hand, type1 | package1.type2 may not look like your usual go type, however it gains the benefit of using the '|' symbol, which is predominantly recognised as an OR. And it reduces verbosity without being cryptic.

Answer 70 · 2017-03-28T20:40:26.000Z

@urandom if you don't have "tag names" but allow interfaces the sums collapse into an interface{} with extra checks. They stop being sum types since you can put one thing in but get it out multiple ways. The tag names let them be sum types and hold interfaces without ambiguity.

The tag names repair a lot more than just the interface{} problem, though. They make the type much less magical and let everything be gloriously explicit without having to invent a bunch of types just to differentiate. You can have explicit assignment and type literals, as you point out.

That you can give one type more than one tag is a feature. Consider a type to measure how many successes or failures have happened in a row (1 success cancels out N failures and vice versa)

type Counter union {
  Successes, Failures uint 
}

without the tag names you'd need

type (
  Success uint
  Failures uint
  Counter Successes | Failures
)

and assignment would look like c = Successes(1) instead of c.Successes = 1. You don't gain much.

Another example is a type that represents local or remote failure. With tag names this is easy to model:

type Failure union {
  Local, Remote error
}

The providence of the error can be specified with its tag name, regardless of what the actual error is. Without tag names you'd need type Local { error } and the same for remote, even if you allow interfaces directly in the sum.

The tag names are sort of creating special neither alias nor named types locally in the union. Having multiple "tags" with identical types isn't unique to my proposal: it's what every functional language (that I know of) does.

The ability to create unexported tags for exported types and vice versa is an interesting twist, too.

Also having separate tag and type assertions allows for some interesting code, like being able to promote a shared method to the union with a one-line wrapper.

It seems like it solves more problems than it causes and makes everything fit together much nicer. I honestly wasn't so sure when I wrote it up, but I'm becoming increasingly convinced that it's the only way to resolve all the issues with integrating sums into Go.

Answer 71 · 2017-03-28T22:24:35.000Z

To expand on that somewhat, the motivating example for me was from @rogpeppe io.Reader | io.ReadCloser. Allowing interfaces without tags, this is the same type as io.Reader.

You can put a ReadCloser in and pull it out as a Reader. You lose the A | B means A or B property of sum types.

If you need to be specific about sometimes handling an io.ReadCloser as an io.Reader you need to create wrapper structs as @bcmills pointed out, type Reader struct { io.Reader } etc. and have the type be Reader | ReadCloser.

Even if you limit sums to interfaces with disjoint method sets, you still have this problem because one type can implement more than one of those interfaces. You lose the explicitness of sum types: they're not "A or B": they're "A or B or sometimes whichever you feel like".

Worse, if those types are from other packages they can suddenly behave differently after an update even if you were very careful to construct your program so that A is never treated the same as B.

Originally I explored disallowing interfaces to solve the problem. No one was happy with that! But it also didn't get rid of problems like a = b meaning different things depending on the types of a and b, which I'm not comfortable with. There also had to be a lot of rules about what type is chosen in the pick when type assignability comes into play. It's a lot of magic.

You add tags and that all goes away.

With union { R io.Reader | RC io.ReadCloser } you can explicitly say I want this ReadCloser to be considered as a Reader if that's what makes sense. No wrapper types needed. It's implicit in the definition. Regardless of the type of the tag, it's either one tag or the other.

The downside is that, if you get an io.Reader from somewhere else, say a chan receive or func call, and it might be an io.ReadCloser and you need to assign it to the proper tag you have to type assert on io.ReadCloser and test. But that makes the intent of the program much clearer—exactly what you mean is in the code.

Also because the tag assertions are different from type assertions, if you really don't care and just want an io.Reader regardless, you can use a type assertion to pull that out, regardless of tag.

Answer 72 · 2017-03-29T03:46:45.000Z

This is a best-effort transliteration of a toy example into Go without unions/sums/etc. It's probably not the best example but it's the one I used to see what this would look like.

It shows the semantics in a more operational fashion, which will likely be easier to understand than some terse bullet points in a proposal.

There is quite a bit of boilerplate in the transliteration so I generally only written the first instance of several methods with a note about the repetition.

In Go with union proposal:

type fail union { //zero value: (Local, nil)
  Local, Remote error
}

func (f fail) Error() string {
  //Could panic if local/remote nil, but assuming
  //it will be constructed purposefully
  return f.(error).Error()
}

type U union { //zero value: (A, "")
  A, B, C string
  D, E    int
  F       fail
}

//in a different package

func create() pkg.U {
  return pkg.U{D: 7}
}

func process(u pkg.U) {
  switch u := u.[type] {
  case A:
    handleA(u) //undefined here, just doing something with unboxed value
  case B:
    handleB(u)
  case C:
    handleC(u)
  case D:
    handleD(u)
  case E:
    handleE(u)
  case F:
    switch u := u.[type] {
    case Local:
      log.Fatal(u)
    case Remote:
      log.Printf("remote error %s", u)
      retry()
    } 
  }
}

Transliterated to current Go:

(notes are included about the differences between the transliteration and the above)

const ( //simulates tags, namespaced so other packages can see them without overlap
  Fail_Local = iota
  Fail_Remote
)

//since there are only two tags with a single type this can
//be represented precisely and safely
//the error method on the full version of fail can be
//put more succinctly with type embedding in this case

type fail struct { //zero value (Fail_Local, nil) :)
  remote bool
  error
}

// e, ok := f.[Local]
func (f *fail) TagAssertLocal2() (error, bool) { //same for TagAssertRemote2
  if !f.remote {
    return nil, false
  }
  return f.error, true
}

// e := f.[Local]
func (f *fail) TagAssertLocal() error { //same for TagAssertRemote
  if !f.remote {
    panic("invalid tag assert")
  }
  return f.error
}

// f.Local = err
func (f *fail) SetLocal(err error) { //same for SetRemote
  f.remote = false
  f.error = err
}

// simulate tag switch
func (f *fail) TagSwitch() int {
  if f.remote {
    return Fail_Remote
  }
  return Fail_Local
}

// f.(someType) needs to be written as f.TypeAssert().(someType)
func (f *fail) TypeAssert() interface{} {
  return f.error
}

const (
  U_A = iota
  U_B
  // ...
  U_F
)

type U struct { //zero value (U_A, "", 0, fail{}) :(
  kind int //more than two types, need an int
  s string //these would all occupy the same space
  i int
  f fail
}

//s, ok := u.[A]
func (u *U) TagAssertA2() (string, bool) { //similar for B, etc.
  if u.kind == U_A {
    return u.s, true
  }
  return "", false
}

//s := u.[A]
func (u *U) TagAssertA() string { //similar for B, etc.
  if u.kind != U_A {
    panic("invalid tag assert")
  }
  return u.s
}

// u.A = s
func (u *U) SetA(s string) { //similar for B, etc.
  //if there were any pointers or reference types
  //in the union, they'd have to be nil'd out here,
  //since the space isn't shared
  u.kind = U_A
  u.s = s
}

// special case of u.F.Local = err
func (u *U) SetF_Local(err error) { //same for SetF_Remote
  u.kind = U_F
  u.f.SetLocal(err)
}

func (u *U) TagSwitch() int {
  return u.kind
}

func (u *U) TypeAssert() interface{} {
  switch u.kind {
  case U_A, U_B, U_C:
    return u.s
  case U_D, U_E:
    return u.i
  }
  return u.f
}

//in a different package

func create() pkg.U {
  var u pkg.U
  u.SetD(7)
  return u
}

func process(u pkg.U) {
  switch u.TagSwitch() {
  case U_A:
    handleA(u.TagAssertA())
  case U_B:
    handleB(u.TagAssertB())
  case U_C:
    handleC(u.TagAssertC())
  case U_D:
    handleD(u.TagAssertD())
  case U_E:
    handleE(u.TagAssertE())
  case U_F:
    switch u := u.TagAssertF(); u.TagSwitch() {
    case Fail_Local:
      log.Fatal(u.TagAssertLocal())
    case Fail_Remote:
      log.Printf("remote error %s", u.TagAssertRemote())
    }
  }
}

Answer 73 · 2017-03-29T09:18:14.000Z

@jimmyfrasche

Since the union contains tags that may have the same type, wouldn't the following syntax be better suited:

func process(u pkg.U) {
  switch v := u {
  case A:
    handleA(v) //undefined here, just doing something with unboxed value
  case B:
    handleB(v)
  case C:
    handleC(v)
  case D:
    handleD(v)
  case E:
    handleE(v)
  case F:
    switch w := v {
    case Local:
      log.Fatal(w)
    case Remote:
      log.Printf("remote error %s", w)
      retry()
    } 
  }
}

The way I see it, when used with a switch, a union is quite similar to types such as int, or string. The main difference being that there are only finite 'values' that can be assigned to it, as opposed to the former types, and the switch itself is exhaustive. Thus, in this case I don't really see the need for a special syntax, reducing the mental work of the developer.

Also, under this proposal, would such code be valid:

type Foo union {
    // Completely different types, no ambiguity
    A string
    B int
}

func Bar(f Foo) {
    switch v := f {
        ....
    }
}

....

func main() {
    // No need for Bar(Foo{A: "hello world"})
    Bar("hello world")
    Bar(1)
}

Answer 74 · 2017-03-29T15:13:10.000Z

@urandom I chose a syntax to reflect the semantics using analogies to existing Go syntax whenever possible.

With interface types you can do

var i someInterface = someValue //where someValue implements someInterface.
var j someInterface = i //this assignment is different from the last one.

That's fine and unambiguous since it doesn't matter what the type of someValue is as long as the contract is satisfied.

When you introduce tags† on unions it can sometimes be ambiguous. Magic assignment would only be valid in certain cases. Special casing it only gets you around having to be explicit sometimes.

I don't see a point in being able to sometimes skip a step, especially when a code change can easily invalidate that special case and then you have to go back and update all the code anyway. To use your Foo/Bar example if C int is added to Foo then Bar(1) has to change but not Bar("hello world"). It complicates everything to save a few keystrokes in situations that may not be that common and makes the concepts harder to understand because sometimes they look like this and sometimes they look that—just consult this handy flowchart to see which applies to you!

† I wish I had a better name for those. There are already struct tags. I would have called them labels but Go has those, too. Calling them fields seems both more appropriate and the most confusing. If anyone wants to bikeshed this one could really use a fresh coat.

In a sense, tagged unions are more similar to a struct than an interface. They're a special kind of struct that can only have one field set at a time. Seen in that light, your Foo/Bar example would be like saying this:

type Foo struct {
  A string
  B int
}

func Bar(f Foo) {...}

func main() {
  Bar("hello world") //same as Bar(Foo{A: "hello world", B: 0})
  Bar(1) //same as Bar(Foo{A: "", B: 1})
}

While it is unambiguous in this case, I don't think it's a good idea.

Also in the proposal Bar(Foo{1}) is allowed when it is unambiguous if you really want to save keystrokes. You can also have pointers to unions so the composite literal syntax is still necessary for &Foo{"hello world"}.

That said, unions do have a similarity to interfaces in that they have a dynamic tag of which "field" is currently set.

The switch v := u.[type] {... nicely mirrors the switch v := i.(type) {... for interfaces while still allowing type switches and assertions directly on union values. Maybe it should be u.[union] to make it easier to spot, but either way the syntax isn't that heavy and it's clear what's meant.

You could make the same argument that the .(type) is unnecessary, but when you see that you always know exactly what's happening and that fully justifies it, in my opinion.

That was my reasoning behind these choices.

Answer 75 · 2017-03-29T20:11:25.000Z

@jimmyfrasche
The switch syntax seems a bit counter-intuitive to me, even after your explanations. With an interface, switch v := i.(type) {... switches through possible types, as listed by the switch cases, and indicated by .(type).
However, with a union, a switch is not switching through possible types, but values. Each case represents a different possible value, where values may in fact share the same type. This is more similar to strings and int switches, where cases also list values, and their syntax is a simple switch v := u {... . From that, to me it seems more natural that switching through the values of a union would be switch v := u { ..., since the cases are similar, but more restrictive, than the cases for ints and strings.

Answer 76 · 2017-03-29T20:32:57.000Z

@urandom that's a very good point about the syntax. The truth is it's a holdover from my previous proposal without labels so it was the type then. I just blindly copied it over without thinking. Thanks for pointing it out.

switch u {... would work but the problem with switch v := u {... is it looks too much like switch v := f(); v {... (which would make error reporting more difficult—not clear which was intended).

If the union keyword were renamed to pick as has been suggested by @as then the tag switch could be written as switch u.[pick] {... or switch v := u.[pick] {... which keeps the symmetry with a type switch but loses the confusion and looks pretty nice.

Even if the implementation is switching on an int there's still implicit destructuring of the pick into dynamic tag and stored value, which I think should be explicit, regardless of grammatical rules

Answer 77 · 2017-03-30T01:18:00.000Z

you know, just calling the tags fields and having it be field assert and field switch makes a great deal of sense.

edit: that would make using reflect with picks awkward, though

Answer 78 · 2017-04-11T11:50:08.000Z

[Sorry for delayed response - I was away on vacation]

@ianlancetaylor wrote:

Something I don't see in your proposals is why we should do this. There is a clear disadvantage to adding a new kind of type: it's a new concept that everybody who leans Go will have to learn. What is the compensating advantage? In particular, what does the new kind of type give us that we don't get from interface types?

There are two main advantages that I see. The first is a language advantage; the second is a performance advantage.

when processing messages, particularly when read from a concurrent process, it is very useful to be able to know the complete set of messages that can be received, because each message can come with associated protocol requirements. For a given protocol, the number of possible message types might be very small, but when we use an open-ended interface to represent the messages, that invariant is not clear. Often people will use a different channel for each message type to avoid this, but that comes with its own costs.
there are times when there is a small number of known possible message types, none of which contain pointers. If we use an open-ended interface to represent them we need to incur an allocation to make interface values. Using a type that restricts the possible message types means that can be avoided and hence relieve GC pressure and increase cache locality.

Answer 79 · 2017-04-24T17:47:03.000Z

A particular pain for me that sum types could solve is godoc. Take ast.Spec for example: https://golang.org/pkg/go/ast/#Spec

Many packages manually list the possible underlying types of a named interface type, so that a user can quickly get an idea without having to look at the code or rely on name suffixes or prefixes.

If the language already knows all the possible values, this could be automated in godoc much like enum types with iotas. They could also actually link to the types, as opposed to being just plaintext.

Edit: another example: mvdan/sh@ebbfda5

Answer 80 · 2017-04-24T18:19:01.000Z

@mvdan that's an excellent, practical point for improving the story in Go1 without any language changes. Can you file a separate issue for that and reference this one?

Answer 81 · 2017-04-24T18:34:40.000Z

Sorry, are you referring to just links to other names within the godoc page, but still listing them manually?

Answer 82 · 2017-04-24T18:44:01.000Z

Sorry, should have been clearer.

I meant a feature request for automatically handling types that implement interfaces defined in the current package in godoc.

(I believe there's a feature request somewhere for linking manually listed names, but I don't have the time to hunt it down at the moment).

Answer 83 · 2017-04-26T10:13:29.000Z

I don't wish to take over this (already very long) thread, so I've created a separate issue - see above.

Answer 84 · 2017-05-04T00:02:40.000Z

@Merovius I'm replying to #19814 (comment) in this issue since the AST stuff applies more to sum types than enums. Apologies for pulling you into a different issue.

First, I'd like to reiterate that I'm not sure if sum types belong in Go. I've yet to convince myself that they definitely do not belong. I'm working under the assumption that they do in order to explore the idea and see whether they fit. I'm willing to be convinced either way, though.

Second, you mentioned gradual code repair in your comment. Adding a new term to a sum type is by definition a breaking change, on par with adding a new method to an interface or removing a field from a struct. But this is the correct and desired behavior.

Let's consider the example of an AST, implemented with a Node interface, that adds a new kind of node. Let's say the AST is defined in an external project and you're importing that in a package in your project, which walks the AST.

There are a number of cases:

Your code expects to walk every node:
1.1. You don't have a default statement, your code is silently incorrect
1.2. You have a default statement with a panic, your code fails at runtime instead of compile time (tests don't help because they only know about the nodes that existed when you wrote the tests)
Your code only inspects a subset of node types:
2.1. This new kind of node would not have been in the subset anyway
2.1.1. As long as this new node never contains any of the nodes that you are interested in, everything works out
2.1.2. Otherwise, you're in the same situation as if your code expected to walk every node
2.2. This new kind of node would have been in the subset you're interested in, had you known about it.

With interface-based AST only case 2.1.1 works correctly. This is coincidence as much as anything. Gradual code repair doesn't work. The AST hast to bump its version and your code needs to bump its version.

An exhaustiveness linter would help but since the linter can't examine all interface types it needs to be told in some manner that a particular interface needs to be checked. That either means an in source comment or some sort of config file in your repo. If it's an in source comment, since by definition the AST is defined in a separate project you are at the mercy of that project to tag the interface for exhaustiveness checking. This only works at scale if there is a single exhaustiveness linter that the entire community agrees upon and always uses.

With a sum-based AST, you still need to use versioning. The only difference in this case is that the exhaustiveness linter is built into the compiler.

Neither helps with 2.2, but what could?

There's a simpler, AST-adjacent, case where sum types would be useful: tokens. Say you're writing a lexer for a simpler calculator. There are tokens like * that don't have any values associated with them and tokens like Var that have a string representing the name, and tokens like Val that hold a float64.

You could implement this with interfaces but it would be tiresome. You'd probably do something like this, though:

package token
type Type int
const (
  Times Type = iota
  // ...
  Var
  Val
)
type Value struct {
  Type
  Name string // only valid if Type == Var
  Number float64 // only valid if Type == Val
}

An exhaustiveness linter on iota-based enums could ensure an illegal Type is never used, but it wouldn't work too well against someone assigning to Name when Type == Times or using Number when Type == Var. As the number and kind of tokens grow it only gets worse. Really the best you could do here is add a method, Valid() error, that checks all the constraints and a bunch of documentation explaining when you can do what.

A sum type easily encodes all of those constraints and the definition would be all the documentation needed. Adding a new kind of token would be a breaking change but everything I said about AST's still applies here.

I think more tooling is necessary. I'm just not convinced it's sufficient.

Answer 85 · 2017-05-04T07:09:42.000Z

@jimmyfrasche

Second, you mentioned gradual code repair in your comment. Adding a new term to a sum type is by definition a breaking change, on par with adding a new method to an interface or removing a field from a struct.

No, it is not on par. You can do both of those changes in a gradual repair model (for interfaces: 1. Add new method to all implementations, 2. Add method to interface. For struct fields: 1. Remove all usages of field, 2. Remove field). Adding a case in a sum type can not work in a gradual repair model; if you add it do the lib first, it would break all users, as they don't check exhaustively anymore, but you can't add it to the users first, because the new case doesn't exist yet. Same goes for removal.

It's not about whether or not it's a breaking change, it's about whether it's a breaking change that can be orchestrated with minimal interruption.

But this is the correct and desired behavior.

Exactly. Sum types, by their very definition and every reason people want them, are fundamentally incompatible with the idea of gradual code repair.

With interface-based AST only case 2.1.1 works correctly.

No, it also works correctly in case 1.2 (failing at runtime for unrecognized grammar is perfectly fine. I probably wouldn't want to panic, though, but just return an error) and also in a whole lot of cases of 2.1. The rest is a fundamental issue with upgrading software; if you add a new feature to a library, users of your lib need to change code to make use of it. It doesn't mean your software is incorrect until it does, though.

The AST hast to bump its version and your code needs to bump its version.

I do not see how this follows from what you are saying, at all. To me, saying "this new grammar won't work with all tools yet, but it is available to the compiler" is fine. Just as "if you run this tool on this new grammar, it will fail at runtime" is fine. At its very worst, this only adds another step to the gradual repair process: a) Add the new node to the AST package and parser. b) Fix tools using the AST package to take advantage of the new node. c) Update code to use the new node. Yes, the new node will only become usable, after a) and b) are done; but in every step of this process, without any breakages, everything will still compile and work correctly.

I'm not saying you'll be automatically fine in a world of gradual code repair and no exhaustive compiler checks. It will still require careful planning and execution, you will still probably break unmaintained reverse dependencies and there might still be changes you might not be able to do at all (though I can't think of any). But at least a) there is a gradual upgrade path and b) the decision of whether this should break your tool at runtime, or not, is up to the author of the tool. They can decide what to do in a case that is unknown.

An exhaustiveness linter would help but since the linter can't examine all interface types it needs to be told in some manner that a particular interface needs to be checked.

Why? I would argue that it's fine for switchlint™ to complain about any type-switch without a default-case; after all, you'd expect the code to work with any interface definition, so not having code in place to work with unknown implementations is likely a problem anyway. Yes, there are exceptions to this rule, but exceptions can already be manually ignored.

I'd probably be more on board with enforcing "every type-switch should require a default case, even if it's empty" in the compiler, than with actual sum types. It would both enable and force people to make the decision of what their code should do when faced with an unknown choice.

You could implement this with interfaces but it would be tiresome.

shrug it's a one-time effort in a case that very rarely comes up. Seems fine to me.

And FWIW, I'm currently only arguing against the exhaustive checking notion of sum types. I don't yet have any strong opinions about the added convenience of saying "any of these structurally defined types".

Answer 86 · 2017-05-04T17:01:21.000Z

@Merovius I'm going to have to think further on your excellent points about gradual code repair. In the meantime:

exhaustiveness checks

I'm currently only arguing against the exhaustive checking notion of sum types.

You can explicitly opt out of exhaustiveness checks with a default case (well, effectively: the default makes it exhaustive by adding a case that covers "anything else, whatever that may be"). You still have a choice, but you must make it explicitly.

I would argue that it's fine for switchlint™ to complain about any type-switch without a default-case; after all, you'd expect the code to work with any interface definition, so not having code in place to work with unknown implementations is likely a problem anyway. Yes, there are exceptions to this rule, but exceptions can already be manually ignored.

That's an interesting idea. While it would hit sum types simulated with interface and enums simulated with const/iota, it doesn't tell you that you missed a known case, just that you didn't handle the unknown case. Regardless, it seems noisy. Consider:

switch {
case n < 0:
case n == 0:
case n > 0:
}

That's exhaustive if n is integral (for floats it's missing n != n) but without encoding a lot of information about types it's probably easier to just flag that as missing default. For something like:

switch {
case p[0](a, b):
case p[1](a, b):
//...
case p[N](a, b):
}

even if the p[i] form an equivalence relation on the types of a and b it's not going to be able to prove that, so it must flag the switch as missing a default case, which means a way to silence it with a manifest, an annotation in the source, a wrapper script to egrep -v out the whitelisted, or an unnecessary default on the switch which falsely implies that the p[i] are not exhaustive.

At any rate that would be trivial linter to implement if the "always complain about no default in all circumstances" route is taken. It would be interesting to do so and run it on go-corpus and see how noisy and/or useful it is in practice.

tokens

Alternate token implementations:

//Type defined as before
type SimpleToken { Type }
type StringToken { Type; Value string }
type NumberToken { Type; Value float64 }
type Interface interface {
  //some method common to all these types, maybe just token() Interface
}

That gets rid of the possibility of defining an illegal token state where something has both a string and a number value but doesn't disallow creating a StringToken with a type that should be a SimpleToken or vice versa.

To do that with interfaces you need to define one type per token (type Plus struct{}, type Mul struct{}, etc.) and most of the definitions are exactly the same exact for the type name. One time effort or not that's a lot of work (though well suited for code generation in this case).

I suppose you could have a "hierarchy" of token interfaces to partition the kinds of tokens based on the allowable values: (Assuming in this example that there is more than one kind of token that can contain a number or string, etc.)

type SimpleToken int //implements token.Interface
const (
  Plus SimpleToken = iota
  // ...
}
type NumericToken interface {
  Interface
  Value() float64
  nt() NumericToken
}
type IntToken struct { //implements NumericToken, and a FloatToken
type StringToken interface { // for Var and Func and Const, etc.
  Interface
  Value() string
  st() StringToken
}

Regardless, it means each token requires a pointer deference to access its value, unlike with the struct or sum type which only require pointers when strings are involved. So with appropriate linters and improvements to godoc the big win for sum types in this case is related to minimizing allocations while disallowing illegal states and the amount of typing (in the keyboard sense), which doesn't seem unimportant.

Answer 87 · 2017-05-04T18:05:29.000Z

You can explicitly opt out of exhaustiveness checks with a default case (well, effectively: the default makes it exhaustive by adding a case that covers "anything else, whatever that may be"). You still have a choice, but you must make it explicitly.

So, it seems like either way, both of us will have the choice to opt in or out of exhaustive checking :)

it doesn't tell you that you missed a known case, just that you didn't handle the unknown case.

Effectively, I believe, the compiler already does a whole-program analysis to determine what concrete types are used in what interfaces I think? I would at least expect it to, at least for for non-interface type assertions (i.e. type-assertions that are not asserting to an interface type, but to a concrete type), generate the function tables used in interfaces at compile time.
But, honestly, this is argued from first principles, I have no idea about the actual implementation.

In any case, it should be pretty easy, to a) list any concrete type defined in a whole program and b) for any type-switch, filter them on whether they implement that interface. If you use something like this, you'd end up with a reliable list. I think.

I'm not 100% convinced that a tool can be written that is as reliable as actually explicitly stating the options, but I'm convinced that you could cover 90% of the cases and you could definitely write a tool that does this outside of the compiler, given the correct annotations (i.e. make sum-types a pragma-like comment, not an actual type). Not a great solution, admittedly.

Regardless, it seems noisy. Consider:

I think this is being unfair. The cases you are mentioning have nothing at all to do with sum-types. If I where to write such a tool, I'd restrict it to type-switches and switches with an expression, as those seem to be the way sum types would be handled too.

Alternate token implementations:

Why not a marker-method? You don't need a type-field, you get that for free from the interface representation. If you are concerned about repeating the marker-method over and over again; define an unexported struct{}, give it that marker method and embed it in each implementation, for zero extra cost and less typing per option than your method.

Regardless, it means each token requires a pointer deference to access its value

Yes. It's a real cost, but I don't think it outweighs basically any other argument.

Answer 88 · 2017-05-04T18:48:48.000Z

I think this is being unfair.

That's true.

I wrote a quick-and-dirty version and ran it on the stdlib. Checking any switch statement had 1956 hits, restricting it to skip the switch { form reduced that count to 1677. I haven't inspected any of those locations to see if the result is meaningful.

https://github.com/jimmyfrasche/switchlint

There's certainly a great deal of room for improvement. It's not terribly sophisticated. Pull requests welcome.

(I'll reply to the rest later)

edit: wrong markup format

Answer 89 · 2017-05-04T20:14:28.000Z

I think this is a (quite biased) summary of everything so far (and narcissistically assuming my second proposal)

Pros

concise, easy to write a number of constraints succinctly in a self-documenting manner
better control of allocations
easier to optimize (all possibilities known to compiler)
exhaustive checking (when desired, can opt out)

Cons

any change to the members of a sum type is a breaking change, disallowing gradual code repair unless all external packages opt out of exhaustiveness checks
one more thing in the language to learn, some conceptual overlap with existing features
garbage collector has to know which members are pointers
awkward for sums of the form 1 + 1 + ⋯ + 1

Alternatives

iota "enum" for sums of the form 1 + 1 + ⋯ + 1
interfaces with an unexported tag method for more complicated sums (possibly generated)
or struct with an iota enum and extra-linguistic rules about which fields are set dependent on the enums value

Regardless

better tooling, always better tooling

Answer 90 · 2017-05-04T20:30:51.000Z

For gradual repair, and that is a big one, I think the only option is for external packages to opt out of the exhaustiveness checks. This does imply that it must be legal to have an "unnecessary" default case only concerned with future proofing even though you otherwise match everything else. I believe that that's implicitly true now, and if not easy enough to specify.

There could be an announcement from a package maintainer that "hey we're going to be adding a new member to this sum type in the next version, make sure you can handle it" and then a switchlint tool could find any cases that need to be opted out.

Not as straightforward as other cases, but still quite doable.

When writing a program that uses an externally defined sum type you could comment out the default to make sure you didn't miss any known cases and then uncomment it before committing. Or there could be a tool to let you know that the default is "unnecessary" which tells you that you got everything known and are future-proofed against the unknown.

Answer 91 · 2017-05-04T23:32:59.000Z

Let's say we want to opt-in to exhaustiveness checking with a linter when using interface types simulating sum types, regardless of the package they're defined in.

@Merovius your betterSumType() BetterSumType trick is very cool, but it means that switches have to happen in the defining package (or you expose something like

func CallBeforeSwitches(b BetterSumType) (BetterSumType, bool) {
	if b == nil {
		return nil, false
	}
	b = b.betterSumType()
	if b == nil {
		return nil, false
	}
	return b, true
}

and also lint that that's called every time).

What are the criteria necessary to check that all switches in a program are exhaustive?

It can't be the empty interface, because then anything's game. So it needs at least one method.

If the interface has no unexported methods, any type could implement it so exhaustiveness would depend on all the packages up the callgraph of each switch. It's possible to import a package, implements its interface, and then send that value to one of the package's functions; so a switch in that function wouldn't be able to be exhaustive without creating an import cycle. So it needs at least one unexported method. (This subsumes the previous criterion).

Embedding would mess up the property we're looking for, so we need to ensure that none of the importers of the package ever embed the interface or any of the types that implement it at any point. A really fancy linter may be able to tell that sometimes embedding is okay if we never call a certain function that creates an embedded value or none of the embedded interfaces ever "escape" the API boundary of the package.

To be thorough we either need to check that the zero value of the interface is never passed around or enforce that an exhaustive switch check case nil as well. (The latter is easier but the former is preferred since including nil turns a "type A or type B or type C" sum into a "nil or type A or type B or type C" sum).

Let's say we have a linter, with all those abilities, even the optional ones, that can verify these semantics for any tree of imports and any given interface within that tree.

Now let's say we have project with a dependency D. We want to make sure an interface defined in one of the packages of D is exhaustive in our project. Let's say it does.

Now, we need to add a new dependency to our project D′. If D′ imports the package in D that defined the interface type in question but does not use this linter, it can easily destroy the invariants that need to hold for us to use exhaustive switches.

For that matter, let's say D just passed the linter coincidentally not because the maintainer runs it. An upgrade to D could just as easily destroy the invariants as D′.

Even if the linter can say "right now this is 100% exhaustive 👍" that can change without us doing anything.

Answer 92 · 2017-05-05T17:04:59.000Z

An exhaustiveness checker for "iota enums" seems easier.

For all type t u where u is integral and t is used as a const with either individually specified values or iota such that the zero value for u is included among these constants.

Notes:

Duplicate values can be treated as aliases and ignored in this analysis. We'll assume all named constants have distinct values.
1 << iota may be treated as a powerset, I believe at least most of the time, but would probably require extra conditions, especially surrounding bitwise complement. For the time being, they will not be considered

For some shorthand, let's call min(t) the constant such that for any other constant, C, min(t) <= C, and, similarly, let's call max(t) the constant such that for any other constant, C, C <= max(t).

To ensure t is used exhaustively we need to ensure that

values of t are always the named constants (or 0 in certain idiomatic positions, like function invocation)
There are no inequality comparisons of a value of t, v, outside of min(t) <= v <= max(t)
values of t are never used in arithmetic operations +, /, etc. A possible exception could be when the result is clamped between min(t) and max(t) immediately afterward, but that could be hard to detect in general so it may require an annotation in comments and should probably be restricted to the package that defines t.
switches contain all constants of t or a default case.

This still requires verifying all the packages in the import tree and can be invalidated as easily, although it is less likely to be invalidated in idiomatic code.

Answer 93 · 2017-06-29T04:50:19.000Z

My understanding is that this, similar to type aliases, won't be breaking changes, so why hold it up for Go 2?

Answer 94 · 2017-06-29T05:00:01.000Z

Type aliases don't introduce a new keyword, which is a definite breaking change. There also seems to be a moratorium on even minor language changes and this would be a major change. Even just retrofitting all marshal/unmarshal routines to handle reflected sum values would be a huge ordeal.

Answer 95 · 2017-06-29T05:00:50.000Z

Type alias are fixing an issue for which there was no workaround for. Sum types provide a benefit in type safety, but it isn't a show stopper not having them.

Answer 96 · 2017-08-09T20:50:17.000Z

Just one (minor) point in favour of something like @rogpeppe's original proposal. In package http, there's the interface type Handler and a function type that implements it, HandlerFunc. Right now, in order to pass a function to http.Handle, you explicitly have to convert it to a HandlerFunc. If http.Handle instead accepted an argument of type HandlerFunc | Handler, it could accept any function/closure assignable to HandlerFunc directly. The union effectively serves as a type hint telling the compiler how values with unnamed types can be converted to the interface type. Since HandlerFunc implements Handler, the union type would behave exactly like Handler otherwise.

Answer 97 · 2017-08-16T16:58:52.000Z

@griesemer in response to your comment in the enum thread, #19814 (comment), I think my proposal earlier in this thread #19412 (comment) addresses the question of how sum types ("swift style enums") would have to work in Go. As much as I would like them, I don't know if they would be a necessary addition to Go, but I do think if they were added they'd have to look/operate a lot like that.

That post's not complete and there's clarifications throughout this thread, before and after, but I don't mind reiterating those points or summarizing since this thread is quite long.

Answer 98 · 2017-08-17T16:06:56.000Z

If you have a sum type simultated by an interface with a type tag and absolutely cannot have it be circumvented by embedding, this is the best defense I've come up with: https://play.golang.org/p/FqdKfFojp-

Answer 99 · 2017-08-17T16:18:43.000Z

@jimmyfrasche I wrote this a while back.

Answer 100 · 2017-08-17T16:24:40.000Z

Another possible approach is this: https://play.golang.org/p/p2tFm984S8