proposal: spec: sum types based on general interfaces

Question

proposal: spec: sum types based on general interfaces

ianlancetaylor opened this issue 2 years ago · 160 comments

This is a speculative issue based on the way that type parameter constraints are implemented. This is a discussion of a possible future language change, not one that will be adopted in the near future. This is a version of #41716 updated for the final implementation of generics in Go.

We currently permit type parameter constraints to embed a union of types (see https://go.dev/ref/spec#Interface_types). We propose that we permit an ordinary interface type to embed a union of terms, where each term is itself a type. (This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.)

That's really the entire proposal.

Embedding a union in an interface affects the interface's type set. As always, a variable of interface type may store a value of any type that is in its type set, or, equivalently, a value of any type in its type set implements the interface type. Inversely, a variable of interface type may not store a value of any type that is not in its type set. Embedding a union means that the interface is something akin to a sum type that permits values of any type listed in the union.

For example:

type MyInt int
type MyOtherInt int
type MyFloat float64
type I1 interface {
    MyInt | MyFloat
}
type I2 interface {
    int | float64
}

The types MyInt and MyFloat implement I1. The type MyOtherInt does not implement I1. None of MyInt, MyFloat, or MyOtherInt implement I2.

In all other ways an interface type with an embedded union would act exactly like an interface type. There would be no support for using operators with values of the interface type, even though that is permitted for type parameters when using such a type as a type parameter constraint. This is because in a generic function we know that two values of some type parameter are the same type, and may therefore be used with a binary operator such as +. With two values of some interface type, all we know is that both types appear in the type set, but they need not be the same type, and so + may not be well defined. (One could imagine a further extension in which + is permitted but panics if the values are not the same type, but there is no obvious reason why that would be useful in practice.)

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

As an implementation note, we could in some cases use a different implementation for interfaces with an embedded union type. We could use a small code, typically a single byte, to indicate the type stored in the interface, with a zero indicating nil. We could store the values directly, rather than boxed. For example, I1 above could be stored as the equivalent of struct { code byte; value [8]byte } with the value field holding either an int or a float64 depending on the value of code. The advantage of this would be reducing memory allocations. It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value. None of this would affect anything at the language level, though it might have some consequences for the reflect package.

As I said above, this is a speculative issue, opened here because it is an obvious extension of the generics implementation. In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

Answer 1 · 2023-01-06T00:04:30.000Z

This proposal does not permit the underlying type syntax ~T to be used in an ordinary interface type, though of course that syntax is still valid for a type parameter constraint.

Could you comment on why this restriction occurs? Is this simply to err on the side of caution initially and potentially remove this restriction in the future? Or is there a technical reason not to do this?

Answer 2 · 2023-01-06T00:19:10.000Z

The reason to not permit ~T is that the current language would provide no mechanism for extracting the type of such a value. Given interface { ~int }, if I store a value of type myInt in that interface, then code in some other package would be unable to use a type assertion or type switch to get the value out of the interface type. The best that it could do would be something like reflect.TypeOf(v).Kind(). That seems sufficiently awkward that it requires more thought and attention, beyond the ideas in this proposal.

Answer 3 · 2023-01-06T00:21:29.000Z

Is there a technical reason that the language could not also evolve to support ~T in a type switch? Granted that this is outside the scope of this proposal, but I think there is a valid use case for it.

Answer 4 · 2023-01-06T00:34:33.000Z

In a vacuum, I'd prefer pretty much any other option, but since it's what generics use, it's what we should go with here and we should embrace it fully. Specifically,

type I2 int | float64 should be legal
v, ok := i.(int | float64) follows from 1
in a type switch case int | float64: works like 2
string | fmt.Stringer should be legal even though that does not currently work with constraints

@dsnet I think comparable and ~T could be considered and discussed separately—if for no reason other than this thread will probably get quite long on its own. I'm 👍 on both.

Answer 5 · 2023-01-06T00:53:09.000Z

With the direct storage mechanism detailed in the post as an alternative to boxing, would it be possible for the zero-value not to be nil after all? For example, if the code value is essentially an index into the list of types and the value stores the value of that type directly, then the zero value with all-zeroed memory would actually default to a zero value of the first type in the list. For example, given

type Example interface {
  int16 | string
}

the zero value in memory would look like {code: 0, value: 0}.

Also, in that format, would the value side change sizes depending on the type? For example, would a value of Example(1) look like {code: 0, value: [...]byte{0, 1}) ignoring endianess, while a value of Example("example") would look like {code: 1, value: [...]byte{/* raw bytes of a string header */}}? If so, how would this affect embedding these interface types into other types, such as a []Example? Would the slice just assume the maximum possible necessary size for the given types? Edit: Never mind, dumb question. The size changing could be a minor optimization when copying, but of course anywhere it's stored would have to assume the maximum possible size, even just local variables, unless the compiler could prove that it's only ever used with a smaller type, I guess.

It would only be possible when all the values stored do not include any pointers, or at least when all the pointers are in the same location relative to the start of the value.

I don't understand this comment, which may indicate that I'm missing something fundamental about the explanation. Why would pointers make any difference? If the above Example type had int16 | string | *int, why would it not just be {code: 2, value: /* the pointer value itself, ignoring whatever it points to */}?

Answer 6 · 2023-01-06T01:16:16.000Z

The example in the proposal is rather contrived, so I tried to imagine some real situations I've encountered where this new capability could be useful to express something that was harder to express before.

Is the following also an example of something that this proposal would permit?

type Success[T] struct {
    Value T
}

type Failure struct {
    Err error
}

type Result[T] interface {
    Success[T] | Failure
}

func Example() Result[string] {
    return Success[string]{"hello"}
}

(NOTE WELL: I'm not meaning to imply that the above would be a good idea, but it's the example that came most readily to mind because I just happened to write something similar -- though somewhat more verbose -- to smuggle (result, error) tuples through a single generic type parameter yesterday. Outside of that limited situation I expect it would still be better to return (string, error).)

Another example I thought of is encoding/json's Token type, which is currently defined as type Token any and is therefore totally unconstrained.

Although I expect it would not be appropriate to change this retroactively for compatibility reasons, presumably a hypothetical green field version of that type could be defined like this instead:

type Token interface {
    Delim | bool | float64 | Number | string
    // (json.Token also allows nil, but since that isn't a type I assume
    // it wouldn't be named here and instead it would just be
    // a nil value of type Token.)
}

Given that the exact set of types here is finite, would we consider it to be a breaking change to add new types to this interface later? If not, that could presumably allow the following to compile by the compiler noticing that the case labels are exhaustive:

// TokenString is a rather useless function that's just here to illustrate an
// exhaustive type switch...
func TokenString(t Token) string {
    switch t := t.(type) {
        case Delim:
            return string(t)
        case bool:
            return strconv.FormatBool(t)
        case float64:
            return strconv.FormatFloat(t, 'g', -1, 64)
        case Number:
            return string(t)
        case string:
            return string
    }
}

I don't feel strongly either way about whether such sealed interfaces should have this special power, but it does seem like it needs to be decided either way before implementation because it would be hard to change that decision later without breaking some existing code.

Even if this doesn't include a special rule for exhaustiveness, this still feels better in that it describes the range of Decoder.Token() far better than any does.

EDIT: After posting this I realized that my type switch doesn't account for nil. That feels like it's a weird enough edge that it probably wouldn't be worth the special case of allowing exhaustive type-switch matching.

Finally, it seems like this would shrink the boilerplate required today to define what I might call a "sealed interface", by which I mean one which only accepts a fixed set of types defined in the same package as the interface.

One way I've used this in the past is to define struct types that act as unique identifiers for particular kinds of objects but then have some functions that can accept a variety of different identifier types for a particular situation:

type ResourceID struct {
    Type string
    Name string
}

type ModuleID struct {
    Name string
}

type Targetable interface {
    // Unexported method means that only types
    // in this package can implement this interface.
    targetable()
}

func (ResourceID) targetable() {}
func (ModuleID) targetable() {}

func Target(addr Targetable) {
    // ...
}

I think this proposal could reduce that to the following, if I've understood it correctly:

type ResourceID struct {
    Type string
    Name string
}

type ModuleID struct {
    Name string
}

type Targetable interface {
    ResourceID | ModuleID
}

func Target(addr Targetable) {
    // ...
}

If any of the examples I listed above don't actually fit what this proposal is proposing (aside from the question about exhaustive matching, which is just a question), please let me know!

If they do, then I must admit I'm not 100% convinced that the small reduction in boilerplate is worth this complexity, but I am leaning towards 👍 because I think the updated examples above would be easier to read for a future maintainer who is less experience with Go and so would benefit from a direct statement of my intent rather than having to infer the intent based on familiarity with idiom or with less common language features.

Answer 7 · 2023-01-06T05:45:11.000Z

@dsnet Sure, we could permit case ~T in a type switch, but there are further issues. A type switch can have a short declaration, and in a type switch case with a single type we're then permitted to refer to that variable using the type in the case. What type would that be for case ~T? If it's T then we lost the methods, and fmt.Printf will behave unexpectedly if the original type had a String method. If it's ~T what can we do with a value of that type? It's quite possible that these questions can be answered, but it's not just outside the scope of this proposal, it's actually complicated.

Answer 8 · 2023-01-06T05:48:18.000Z

@DeedleFake The alternative implementation is only an implementation issue, not a language issue. We shouldn't use that to change something about the language, like whether the value can be nil or some other zero value. In Go the zero value of interface types is nil. It would be odd to change that for the special case of interfaces that embed a union type element.

The reason pointer values matter is that given a value of the interface type, the current garbage collector implementation has to be able to very very quickly know which fields in that value are pointers. The current implementation does this by associating a bitmask of pointers with each type, such that a 1 in the bitmask means that the pointer-sized slot at that offset in the value always holds a pointer.

Answer 9 · 2023-01-06T05:50:24.000Z

@apparentlymart I think that everything you wrote is correct according to this proposal. Thanks.

Answer 10 · 2023-01-06T06:25:14.000Z

In Go the zero value of interface types is nil. It would be odd to change that for the special case of interfaces that embed a union type element.

It would be, but I think it would be worth it. And I don't think it would be so strange as to completely preclude eliminating the extra oddness that would come from union types always being nilable. In fact, I'd go so far as to say that if this way of implementing unions has to have them be nilable, then a different way of implementing them should be found.

The reason pointer values matter is that given a value of the interface type, the current garbage collector implementation has to be able to very very quickly know which fields in that value are pointers.

I was worried it was going to be the garbage collector... Ah well.

Answer 11 · 2023-01-06T09:12:25.000Z

A major problem is that type constraints work on static types while interfaces work on dynamic types of objects. This immediately prohibits this approach to do union types.

type Addable interface {
    int | float32
}

func Add[T Addable](x, y T) T {
    return x + y
}

This works because the static type of T can only be int or float, which means the addition operation is defined for all the type set of T. However, if we allow Addable to be a sum type, then the type set of T becomes {int, float, Addable} which does not satisfy the aforementioned properties!!!

Answer 12 · 2023-01-06T17:01:46.000Z

@merykitty per my understanding of the proposal, I think for the dynamic form of what you wrote you'd be expected to write something this:

type Addable interface {
    int | float32
}

func Add(x, y Addable) Addable {
    switch x := x.(type) {
    case int:
        return x + y.(int)
    case float32:
        return x + y.(float32)
    default:
        panic("unsupported Addable types %T + %T", x, y)
    }
}

Of course this would panic if used incorrectly, but I think that's a typical assumption for interface values since they inherently move the final type checking to runtime.

I would agree that the above seems pretty unfortunate, but I would also say that this feels like a better use-case for type parameters than for interface values and so the generic form you wrote is the better technique for this (admittedly contrived) goal.

Answer 13 · 2023-01-06T17:05:37.000Z

@merykitty No, in your example, Addable itself should not be able to instantiate Add. Addable does not implement itself (only int and float32 do).

Answer 14 · 2023-01-06T17:07:00.000Z

also, note that the type set never includes interfaces. So Addable is never in its own type set.

Answer 15 · 2023-01-06T17:44:21.000Z

Is something like that going to be allowed?

type IntOrStr interface {
	int | string
}

func DoSth[T IntOrStr](x T) {
	var a IntOrStr = x
        _ = a
}

Answer 16 · 2023-01-06T17:57:30.000Z

Let's say I have these definitions.

type I1 interface {
	int | any
}

type I2 interface {
	string | any
}

type I interface {
	I1 | I2
}

Would it be legal to have a variable of type I? Can I assign an I1 to it? What about string? any(int8)? int8?

Answer 17 · 2023-01-06T18:09:45.000Z

@mateusz834 Can't see why not.

@zephyrtronium

Would it be legal to have a variable of type I? Can I assign an I1 to it? What about string? any(int8)? int8?

I think the answer to all of these is "yes". For the cases where you assign an interface value, the dynamic type/value of the I variable would then become the dynamic type/value of the assigned interface. In particular, the dynamic type would never be an interface.

Answer 18 · 2023-01-06T19:05:24.000Z

FWIW my main issue with this proposal is that IMO union types should allow representing something like ~string | fmt.Stringer , but for well-known reasons this isn't possible right now and it's not clear it ever would be. One advantage of "real" sum types is that they have an easier time representing that kind of thing. Specifically, I don't think #54685 has that problem (though it's been a spell that I looked at that proposal in detail).

Answer 19 · 2023-01-07T04:36:34.000Z

I think this approach is elegant given that type sets on constraints already exist, and so for any union discriminated only by types this seems almost perfect.

I think there are three short comings of the proposal that would prevent it from being usable in many of the cases where I currently construct union-like structures.

prevailing nil

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

This is mentioned in the proposal, and I think this constraint simplification is problematic. In Go default values are useful, but by making sum types nillable, we make their default value not useful. Of course maybe this is reasonable given that Go has no widely used "optional" value type beyond pointers.

To address this shortcoming, could we make interface types that contain type sets non-nullable by default, and require an explicit nil | in the type set list. For type sets that do not specify nil, the default value of the interface value would be the zero value of the first type listed.

no support for non-type discriminants

The proposal defines a discriminated union where the discriminant is always the types of each case in the union. This prevents applications from creating unions where the same type appears across multiple cases but with different semantics. This happens a lot in code where I write union-like types today, and I don't think I could use this proposal for most of my union cases without it.

Here's an example of a union-like structure from some code I have.

type ClaimPredicateType int32

const (
 	ClaimPredicateTypeClaimPredicateUnconditional      ClaimPredicateType = 0
	ClaimPredicateTypeClaimPredicateAnd                ClaimPredicateType = 1
	ClaimPredicateTypeClaimPredicateOr                 ClaimPredicateType = 2
	ClaimPredicateTypeClaimPredicateNot                ClaimPredicateType = 3
	ClaimPredicateTypeClaimPredicateBeforeAbsoluteTime ClaimPredicateType = 4
	ClaimPredicateTypeClaimPredicateBeforeRelativeTime ClaimPredicateType = 5
)

type ClaimPredicate struct {
	Type          ClaimPredicateType
	AndPredicates *[]ClaimPredicate `xdrmaxsize:"2"`
	OrPredicates  *[]ClaimPredicate `xdrmaxsize:"2"`
	NotPredicate  **ClaimPredicate
	AbsBefore     *Int64
	RelBefore     *Int64
}

Ref: https://github.com/stellar/go/blob/b4ba6f8e6/xdr/xdr_generated.go#L5815-L5822

The proposal would allow only for writing the following case, which would fail to represent the complete union type:

type ClaimPredicate interface {
   []ClaimPredicate | ClaimPredicate | Int64
}

I have the same type in a few languages, and here's the same type in Rust:

pub enum ClaimPredicate {
    Unconditional,
    And(VecM<ClaimPredicate, 2>),
    Or(VecM<ClaimPredicate, 2>),
    Not(Option<Box<ClaimPredicate>>),
    BeforeAbsoluteTime(i64),
    BeforeRelativeTime(i64),
}

Ref: https://github.com/stellar/rs-stellar-xdr/blob/154e07ebb/src/curr/generated.rs#L6672-L6679

To address this shortcoming could the type set be a type list where each type in the list is also given a field name? This doesn't feel good, but it's the only way I see to address this inside the proposal in its current form. It's not clear to me how this would work in a switch statement as well. For example:

type ClaimPredicate interface {
   and                []ClaimPredicate |
   or                 []ClaimPredicate |
   not                ClaimPredicate |
   beforeAbsoluteTime Int64 |
   beforeRelativeTime Int64
}

no support for a void / no-type case

Sometimes discriminated unions have cases where no data is required. I don't think the proposal supports this. The example in point 2 above has one case like that, the Unconditional case. If such a thing was supported, it could be like:

type ClaimPredicate interface {
   unconditional      void |
   and                []ClaimPredicate |
   or                 []ClaimPredicate |
   not                ClaimPredicate |
   beforeAbsoluteTime Int64 |
   beforeRelativeTime Int64
}

Answer 20 · 2023-01-07T04:43:37.000Z

@ianlancetaylor Does the proposal as-is allow both type sets and functions in an interface? It would have a remarkable property not typically present in sum types where you could have a closed set of types along with the ability to have those types implement some common functions and be used as an interface.

Answer 21 · 2023-01-07T05:03:23.000Z

@leighmcculloch

To address this shortcoming, could we make interface types that contain type sets non-nullable by default, and require an explicit nil | in the type set list. For type sets that do not specify nil, the default value of the interface value would be the zero value of the first type listed.

For reference, this has been suggested a few times in #19412 and #41716, starting with #19412 (comment). Requiring nil variants versus allowing source code order to affect semantics is the classic tension of sum types proposals.

Sometimes discriminated unions have cases where no data is required. I don't think the proposal supports this.

The spelling of a type with no information beyond existence is usually struct{}, or more generally any type with exactly one value. void, i.e. the zero type, means something different: logically it would represent that your unconditional variant is impossible, not that it carries no additional information.

Does the proposal as-is allow both type sets and functions in an interface? It would have a remarkable property not typically present in sum types where you could have a closed set of types along with the ability to have those types implement some common functions and be used as an interface.

Yes, since the proposal is just to allow values of general interfaces less ~T elements, methods would be fine and would dynamically dispatch to the concrete type. I agree that's a neat behavior. Unfortunately it does imply that methods can't be defined on a sum type itself; you'd have to wrap it in a struct or some other type.

Answer 22 · 2023-01-07T06:43:16.000Z

Thanks @zephyrtronium. Taking your feedback into account, and also realizing that it is easy to redefine types, then I think points (2) and (3) I raised are not issues. Type definitions can be used to give the same type different semantics for each case. For example:

type ClaimPredicateUnconditional struct{}
type ClaimPredicateAnd []ClaimPredicate
type ClaimPredicateOr []ClaimPredicate
type ClaimPredicateNot ClaimPredicate
type ClaimPredicateBeforeAbsoluteTime Int64
type ClaimPredicateBeforeRelativeTime Int64

type ClaimPredicate interface {
    ClaimPredicateUnconditional |
    ClaimPredicateAnd |
    ClaimPredicateOr |
    ClaimPredicateNot |
    ClaimPredicateBeforeAbsoluteTime |
    ClaimPredicateBeforeRelativeTime
}

In the main Go code base I work in we have 106 unions implemented as multi-field structs, which require a decent amount of care to use. I think this proposal would make using those unions easier to understand, probably on par in terms of effort to write. If tools like gopls went on to support features like pre-filling out the case statements of a switch based on the type sets, since it can know the full set, that would make writing code using them easier too.

The costs of this proposal feel minimal. Any code using the sum type would experience the type as an interface and have nothing new to learn over that of interfaces. This is I think the biggest benefit of this proposal.

Answer 23 · 2023-01-07T09:36:01.000Z

To me, nil seems to be the big question here?

On the one hand, interface types are nilable and their zero value is nil.

On the other hand, union interface constraints made only of non-nilable types prevent a T from being nil, and that behaviour seems useful here as well. Is it that big a can of worms to say these can't be nil?

Exhaustiveness in type switches could potentially be left to tools.

Answer 24 · 2023-01-07T10:12:59.000Z

@ncruces

Is it that big a can of worms to say these can't be nil?

And instead, they are what? The reason to use nil is that it's precedented for "there is no dynamic type to this interface". If you don't want to use nil, you'd at least have to say what the dynamic type of a union is.

More general, there are essentially four choices around the zero value of a union type:

There is none, values of union type must be explicitly initialized. The downside is, that the language as a whole very much assumes that every type has a zero value (e.g. make([]T, …), map-indexing of non-existing keys, receiving from a closed channel…) and to a lesser degree, that it's represented by all zero bits.
The zero value is specified in the type definition. One downside is that we can't re-use the existing syntax, it needs at least to be amended by a way to specify the zero value.
The zero value is derived from the definition, most obviously "the zero value of the first case". The downside is that now the order of union terms matters, which is counter-intuitive and might not play well with existing assumptions.
The zero value is nil. The downside is, that any union value has an additional case.

This proposal makes the last choice and it seems to me, that's a pretty foundational choice to any union/sum type proposal. So, from the proposal text:

In discussion here, please focus on the benefits and costs of this specific proposal. Discussion of sum types in general, or different proposals for sum types, should remain on #19412 or newer variants such as #54685. Thanks.

So we should, in this discussion, assume that the choice of zero value is fixed as nil and not try to come up with alternative designs. If we dislike a separate nil zero value, then that's simply a reason to reject this proposal:

In particular, the zero value of an interface type with an embedded union would be nil, just as for any interface type. So this is a form of sum type in which there is always another possible option, namely nil. Sum types in most languages do not work this way, and this may be a reason to not add this functionality to Go.

Answer 25 · 2023-01-07T11:20:47.000Z

OK. If we are to leave it at that, then yes, IMO, this is a reason to reject the proposal.

I still think it might be worth discussing here why that's the best choice. Detractors might be persuaded that this is in fact the best choice.

PS: I suppose I find your 3rd option best, and not counter intuitive; but it's a different proposal, and I won't discuss it here if it's considered off topic.

Answer 26 · 2023-01-07T12:58:17.000Z

In principle, I'm in favor of this proposal. Especially since it seems nicely orthogonal.

Just a few concerns that might require further thoughts as was mentioned:

~T : a quick idea off top, would be for this to be a special anonymous interface analogous to basic interfaces where instead of a method set, an underlying type T is specified. In which case, to retrieve the initial type, one would have to switch over as usual. Might also allow conversion to T.
I think nil as the zero value is fine, provided a union interface value where the runtime.type pointer is nil cannot be assigned. It has some complexities wrt slices (make?) , the recently added clear, and perhaps a few other things (or not, I don't know). But if it works, that will be quite a nice fit. :)

Especially since the implementation of an interface is inscrutable, even when aliased, an interface value internal representation cannot be changed back to nil(?) , so for now I'm optimistic.

Why I would appreciate this feature?

(just an example)
A library I wrote needed to limit a function parameter to any of the Go types that can be sent over to JavaScript world via wasm (bool, string, float64, []any, map[string]any).

It's manageable without union types but the API is not as nice as it could be as it requires plenty conversions.
(had to use the trick of defining a Value interface with an unexported method, to be implemented by type Bool bool, type Float float64, etc...)

Also related to marshalling.

Answer 27 · 2023-01-07T13:43:56.000Z

@ncruces

I still think it might be worth discussing here why that's the best choice.

I don't think that's the claim. It's just what's proposed here. This is not the only proposal of its kind.

Answer 28 · 2023-01-07T13:51:14.000Z

provided a union interface value where the runtime.type pointer is nil cannot be assigned.

That seems impossible, or close to.

type I interface { int | string }
func F(p *I) {
    var v I
    if someProgramHalts() {
        *p = v
    }
}

So, the only way to do that, AFAICT, would be to make such interfaces not assignable to their own type. Which means, they can't be passed around as arguments either. Or used in many other places.

I don't believe this is workable at all.

Answer 29 · 2023-01-07T13:55:52.000Z

Well that's the point.
If a variable of this type is declared but not assigned a proper value, it shouldn't be usable.

The one issue would be channels for example, one would have to find a palatable way to deal with channel closures.

One way could be to make nilability opt-in explicitly in those cases:

chan(int | string) // disallowed
chan(int | string | nil) // allowed and actually a supertype of the above, nil being sent on close

That would keep people from assigning the nilable supertype to the actual union type.
They would not exactly be the same type and a regular type assertion check would have to happen.

Anyway, this is just an idea that someone can work with and ponder, I'll leave it at that.

Answer 30 · 2023-01-07T17:50:08.000Z

I hadn't thought of it before writing #57644 (comment), but I think being unable to define methods on these sum types is a major downside. If I define a type like type Parameter interface { int | string } then I can't make a String or UnmarshalText method for it. (Most encoders can marshal it without issue since it would be like having an int or string in an any, but because there's no reflection interface in the proposal, there's no way to automatically unmarshal.) I could define it as this instead:

type Parameter interface {
	IntParam | StringParam
	String() string
	UnmarshalText([]byte) error
}

Then I need to define a separate type for each variant as well as the methods those types need, and I have to use IntParam and StringParam instead of just int and string. It seems like this is the code I would write already, except that I write a union instead of an unexported method. The only thing we've gained is that the implementing types show up in godoc.

Instead, I could write this:

type Parameter struct {
	F interface{ int | string }
}

func (p Parameter) String() string

func (p *Parameter) UnmarshalText(text []byte) error

Now we gain some of the advantages of typical sum types, but first the author has to know that this is a good approach, and then we have to use the struct box instead of the interface value, UnmarshalText needs two layers of indirection (unless we choose a less conservative algorithm for unboxed sum type representations), and we lose some of the nice properties of interfaces like implicit conversions for assignments. Maybe those penalties aren't that bad overall, but I can imagine situations where they would push me toward a different design. And again, this looks very similar to code I might write today.

All of this would be a non-issue if we could define methods on interfaces. That was already rejected in #39799.

Answer 31 · 2023-01-07T17:57:40.000Z

To me, nil seems to be the big question here?

I think the interface value being nillable is fine. I think if we wanted a way to make interface values not nil by default, there could be a separate proposal for that, and it would interact well with this. We don't need to solve that problem as part of this proposal.

Exhaustiveness in type switches could potentially be left to tools.

I think whether switch is exhaustive is an entirely independent proposal to sum types. It can be proposed independent of any type changes, and it doesn't need to be attached to this proposal.

Answer 32 · 2023-01-07T18:20:32.000Z

It would be nice to replace something like this

func (*VM) PushInt(i int)
func (*VM) PushString(s string)
func (*VM) PushEtc(etc *Etc)

with something like this

func (*VM) Push(v int | string | *Etc)

Answer 33 · 2023-01-07T20:55:36.000Z

func (*VM) Push(v int | string | *Etc)

I assume an anonymous interface would work:

func (*VM) Push(v interface { int | string | *Etc})

Answer 34 · 2023-01-07T20:59:16.000Z

A | B | C is shorthand for interface { A | B | C } in generics code and I'm an advocate for that the same rule applying here

Answer 35 · 2023-01-07T23:39:19.000Z

A | B | C is shorthand for interface { A | B | C } in generics code and I'm an advocate for that the same rule applying here

That sounds good, but it leads to this oddity:

func F1(int) { ... }
func F2(string) { ... }
func F3(int | string) { ... }

F1(nil) // Error.
F2(nil) // Error.
F3(nil) // Valid.

Answer 36 · 2023-01-08T00:07:56.000Z

A benefit of this proposal is that it simplifies code that consumes an interface that attempts to do the same thing with interfaces today, especially for the consumer using the types.

Without this proposal:

type RGB struct {
	R byte
	G byte
	B byte
}

func (RGB) isSumType() {}

type CMYK struct {
	C byte
	M byte
	Y byte
	K byte
}

func (CMYK) isSumType() {}

type Color interface{ isSumType() }

func PrintColor(c Color) {
	switch v := c.(type) {
	case nil:
		fmt.Println("nil")
	case RGB:
		fmt.Println(v.R, v.G, v.B)
	case *RGB:
		if v == nil {
			fmt.Println("RGB(nil)")
		} else {
			fmt.Println(v.R, v.G, v.B)
		}
	case CMYK:
		fmt.Println(v.C, v.M, v.Y, v.K)
	case *CMYK:
		if v == nil {
			fmt.Println("CMYK(nil)")
		} else {
			fmt.Println(v.C, v.M, v.Y, v.K)
		}
	}
}

With this proposal there are less surprising cases, like the fact that the type and pointer-type have to be included in the switch above.

type RGB struct {
	R byte
	G byte
	B byte
}

type CMYK struct {
	C byte
	M byte
	Y byte
	K byte
}

func PrintColor(c interface { RGB | CMYK }) {
	switch v := c.(type) {
	case nil:
		fmt.Println("nil")
	case RGB:
		fmt.Println(v.R, v.G, v.B)
	case CMYK:
		fmt.Println(v.C, v.M, v.Y, v.K)
	}
}

Answer 37 · 2023-01-08T01:15:03.000Z

@leighmcculloch The point of methods and interfaces is to avoid all this type switchery within a function. In your example the types RGB and CMYK could have a Print method, which can be part of an interface if needed. The concept of an "interface" is that types have something in common. This proposal counteracts this concept, because it allows to fit types through the same hole that have nothing in common, that's why I don't like it very much. I want less type switches in code, not more.

Answer 38 · 2023-01-08T02:16:39.000Z

Putting sum types under the umbrella of interfaces is almost comedic. You almost always have to treat them with a type switch. If recipients need to inspect the things they receive and have to treat them differently, then it's not an interface, it's the opposite, it's a nuisance.

Type union elements when used as a constraint for type parameters, on the other hand, are ok to be called "interfaces", because we want to access their common operators (like +, * etc.), which is in the spirit of the "interface" concept (to treat things uniformly). And here we're not encouraged to type switch on them, because it's not supported (unless we convert it to 'any' first).

Answer 39 · 2023-01-08T02:48:19.000Z

@gophun I don't see why you cannot treat them as normal interfaces with methods? The only difference between a normal interface and a union interface is that the former is open to inheritance while the dynamic type set of the latter is closed.

Answer 40 · 2023-01-09T07:17:38.000Z

FWIW, I find much of it hard to understand, and Go is supposed to be simple. I agree with @gophun that it would be a petty if a new language feature made ubiquitous type switches necessary – I consider them “last resort” and not a good general pattern.

But I must also admit that I still have not understood the actual use cases. I’ve never had large Go code bases to maintain. Maybe explaining the Target function in #57644 (comment) would make it clearer to me.

My own approach has been: If I need type flexibility, use interfaces if methods are shared and embedding if attributes are shared. And in the remaining edge cases, I use any as the argument type and, well, use a type switch. Of course some errors are then caught in the tests rather than the compilation step, but is this disadvantageous enough that we should make interfaces even more complicated?

Answer 41 · 2023-01-09T08:40:31.000Z

I think it's a feature that will be more useful for library writers who need to expose a given interface.

So basically, it should allow for improvements in coder UX.

I expect that the consumer of a library will often be oblivious to the internal type switching.

Another example of such usefulness is when defining a tree datastructure where nodes can be a handful of very specific types.

Answer 42 · 2023-01-09T08:48:45.000Z

Yes but I still want to understand the motivation better. (Besides, I also write or will write libraries.)

Let be ask something very specific: If you use a sum type instead of any in a function signature, you obviously gain the compile-time check that the given type is one of the types included into the union. I also read in this issue that with sum types, there is the possibility to enhance the Go compiler or linters to see whether your type switches are complete. What are further advantages of sum types over any?

Answer 43 · 2023-01-09T09:00:18.000Z

@bronger Some advantages that I can take from the top of my mind:
1, Sum types are also interfaces and can have common behaviours expressed through methods
2, The type sets of sum types are known, allowing better layout and improved performance
3, Similar to normal interfaces, they express intents regarding the signature of the function, instead of relying on reading the implementation details and documentations

Answer 44 · 2023-01-09T09:06:57.000Z

1, Sum types are also interfaces and can have common behaviours expressed through methods

You mean, additionally to type unions, an interface defines some common methods?

Answer 45 · 2023-01-09T09:11:31.000Z

@bronger WDYM? From the proposal

In all other ways an interface type with an embedded union would act exactly like an interface type.

Which means we can define a union as

type Vehicle interface {
    Car | Bicycle
    Go()
}

Answer 46 · 2023-01-09T09:17:24.000Z

But then, your point (1) is not an advantage because this is also possible with any.

Answer 47 · 2023-01-09T09:29:04.000Z

type Vehicle interface {
    Car | Bicycle
    Go()
}

This interface is unnecessarily specific. It should just be

type Vehicle interface {
    Go()
}

The other version unnecessarily limits the types an outsider can use, and you would need a type switch to discriminate between the two allowed types. The point of interfaces is to treat different things uniformly and do give an outsider the possibility to provide their own types that implement the interface. In this example the outsider can no longer add a Boat to implement the Vehicle interface.

Answer 48 · 2023-01-09T09:50:38.000Z

@gophun That's just an example to show that a sum type also has methods.

The point of interfaces is to treat different things uniformly

This does not in any way say how they are treated to show the uniformity to the outside world. A dynamic dispatch is as valid as an explicit type switch.

and do give an outsider the possibility to provide their own types that implement the interface

No you just made this up, there are interfaces out there that intentionally declare private methods so that other packages cannot implement them. An interface just declares a contract, and a contract can involve no unexpected implementation.

Answer 49 · 2023-01-09T09:53:53.000Z

The most commonly mentioned use-case for sum/union types are AST packages. For example, go/ast.Node is currently an interface, with a bunch of methods. But that definition is obviously wrong. For example, the type struct { ast.Node } also satisfies the interface and has all the necessary methods, but it's certainly not intended to be usable, from the point of view of the ast package.

This becomes a problem when you then pass this into ast.Walk, for example. Walk is implemented as a big type switch over all dynamic types which are expected to be possible Nodes. But the set of possible Nodes is infinite, so the (static) type of ast.Walk is really incorrect - there has to be additional type-checking at runtime.

There are several different ways to fix this:

Sum/union types. This just allows the ast package to enumerate all the valid Node types and be done with it. Nothing else needs to happen.
Add more methods to the interface. For example, the interface could also include a Walk(f func(Visitor, Node)) (or something) method. But then, what about go/format? Or go/types? Or third party Go tools? These also commonly accept an ast.Node (or similar) to do their thing and they are not at liberty to specify that a Node has to have additional methods. So they still have to do the type-switch and runtime type checking.
Make Node a struct, instead of an interface. It could look like type Node struct { comment *Comment; commentGroup *CommentGroup; /* … */ } with a field per possible case. As long as the ast package only creates Nodes with a single of these fields set, it can treat it as a sum. And a struct type can't be "subtyped". However, this has performance problems (there are a lot of possible Nodes, so you need to carry around a lot of pointers, almost all are nil). It's also, technically, still a bit prone to programmer error, as the ast package itself could contain a bug, creating a Node with more than one set field.

So union/sum types are not the only way to solve this. And it might still be questioned if this problem needs solving and if the cost is justified to solve it. But they are a relatively common solution to this kind of problem, in other languages.

Answer 50 · 2023-01-09T09:54:38.000Z

So,

type Vehicle interface {
    Car | Bicycle
    Go()
}

is for the use case that e.g. a function wants to call methods, and additionally do type-switching stuff with the argument?

Answer 51 · 2023-01-09T09:59:02.000Z

@merykitty No, in your example, Addable itself should not be able to instantiate Add. Addable does not implement itself (only int and float32 do).

@Merovius So what you mean is that an Addable does not satisfy itself during generics parameter resolution but it will do during runtime assignments? That seems a little confusing to me.

Answer 52 · 2023-01-09T11:19:56.000Z

@merykitty Maybe. It's already a situation we will be in with Go 1.20 and comparable. So there is precedent for these two things to be different. I also don't think it's an entirely natural idea for these to mean different things in different contexts - that is, an Addable variable is "an opaque box that can hold any of these types" while an Addable constraint is "the type argument must be any of these types".

So, yes, I think there is a certain amount of possible confusion here. But I'm not sure how confusing it'll be, how often it will be a problem and I'm not sure it's avoidable. Surprisingly, there are things which are confusing if you think about them, but if you don't, you just never notice. For example, I doubt most Go programmers could really explain why they can't use a bytes.Buffer as an io.Reader, even though they can call r.Read on it - but in practice, they manage to use it just fine.

Answer 53 · 2023-01-09T11:21:15.000Z

@merykitty
you can see it as Addable not satisfying itself in both cases (the constraint being that something should be either int and float32)

It should implement itself however (the Addable type implements the same constraint (i.e. enforce the same contract as itself)) .
Note that interface{int} also implements Addable as it merely enforces the contract more strictly: not only arguments are int or string, but we know for a fact that they have to be int.

Modulo nil, which I'm optimistic (or at least hopeful) can be solved.

This is similar to a ReaderWriter interface implementing the Reader interface. (subtyping)

Answer 54 · 2023-01-09T11:23:19.000Z

@merykitty FWIW as an analogy: It also doesn't seem like many people are confused that an io.Reader variable can't contain an io.Reader - that is, the dynamic type of an interface is never an interface itself. It's essentially the same situation, it's the same confusion, yet in actual practice no one really wonders why that is.

Answer 55 · 2023-01-10T01:47:22.000Z

If we had:

type Vehicle interface {
    Car | Bicycle
}

type Mover interface {
    Move()
}

would we say that Vehicle satisfies Mover if each element of Vehicle (Car.Move, and Bicycle.Move) do as well?

Answer 56 · 2023-01-10T02:03:04.000Z

@AndrewHarrisSPU As I interpret the proposal, you'd need to have a Move() function on Vehicle so that Vehicle types implement Mover.

type Vehicle interface {
    Car | Bicycle
    Move()
}

type Mover interface {
    Move()
}

Or you could do:

type Vehicle interface {
    Car | Bicycle
    Mover
}

type Mover interface {
    Move()
}

Answer 57 · 2023-01-10T03:14:20.000Z

@AndrewHarrisSPU:

If we had:
type Vehicle interface {
    Car | Bicycle
}

type Mover interface {
    Move()
}
would we say that Vehicle satisfies Mover if each element of Vehicle (Car.Move, and Bicycle.Move) do as well?

Worth noting that that does not work with constraints currently, either: https://go.dev/play/p/TLkZkYzOcdO

I think it makes sense for it not to work. It would be quite confusing and annoying to have to go searching through every type listed and figuring out the intersection of their available methods to see what you could do with it. Instead, just rely on the principle of defining interfaces where they're used and add the expected methods to the interface manually, which should then work.

Answer 58 · 2023-01-10T04:22:46.000Z

@DeedleFake

Worth noting that that does not work with constraints currently, either: https://go.dev/play/p/TLkZkYzOcdO

I think it makes sense for it not to work. It would be quite confusing and annoying to have to go searching through every type listed and figuring out the intersection of their available methods to see what you could do with it.

If we can define a truly disjoint, finite, non-nil-able (or at least nil is an explicit element) type set, we can't include interfaces, but do we need interfaces to reason about the behaviors that are defined on that type set? I'm thinking (maybe naively?) that a compiler can tractably compute various sets-of-method-sets from the type set here. In practice I think a compiler could emit some precise and useful information ("error: Vehicle union doesn't implement Brake(): jetpack doesn't implement Brake()").

Going off-track a bit, I think there's also cases where defining a method on an element in a union itself could be interesting - I could call SetAlpha() on the union of rgb and rgba and still maintain a valid union, but not on an rbg value in isolation. In this case the union could satisfy SetAlpha(), but not if we required SetAlpha() to be defined on rgb.

Answer 59 · 2023-01-10T06:37:03.000Z

I believe the need to explicitly list methods in interfaces containing union elements is an implementation restriction by the current Go compiler and should be lifted sooner or later. I don't see a good reason why it can't (though sometimes these things are surprisingly subtle - there are other implementation restrictions which I don't think can be lifted, or am at least skeptical about). Though I don't understand what "nil-able" has to do with it, you can call methods on nil values just fine.

Also, I agree with the criticism that it's a downside not to be able to define methods on union types, if they are defined like this proposal. Though a lot of the boiler plate can probably be reduced by struct embedding. As for the SetAlpha example, an alternative would be to have the method be RGBA() rgba, which could be implemented on both types and the usage would then be x = x.RGBA(), instead of x.SetAlpha(), which doesn't seem that bad.

Answer 60 · 2023-01-10T08:43:31.000Z

Though I don't understand what "nil-able" has to do with it, you can call methods on nil values just fine.

With a closed, finite type set, do we have to recycle nil as a catch-all? I think we could disallow declaring an instance of a sum type without declaring a variant, and ask implementors to explicitly provide empty/zero/undef variants - at least, I really enjoy this about sum types when I've used them in other languages. If a sum type exhibits a field that is unsafely nil, maybe that could be regarded as programming error that justifies a resulting panic just like it would otherwise.

I'm not sure it'd be insurmountable to do things more like the proposal suggests, and recapitulate the nuances of nils and interfaces, but it makes me nervous ... it looks manageable in small type switches but I think it could get nastier in practice - hard to reason about disjointness.

Answer 61 · 2023-01-10T16:21:07.000Z

Currently the language seems to rely on every type having some meaning for the value that is represented as all zero bytes in memory. For example:

type Example interface {
    int | string
}

m := make(map[string]Example)
v, ok := m["foo"]

Under the current proposal I would expect v to be nil because that is the zero value of Example. If a nil Example were forbidden then it isn't clear what v ought to be here.

Personally, I feel okay with accepting nil interface values as an established part of the "texture" of Go and having these "sealed" interface types inherit that assumption, rather than introducing the one situation where there isn't a zero value and dealing with the effects on all other parts of the language that gave been defined on the assumption of zero values, although I do agree that it'll mean that patterns from other languages with different type systems won't translate over exactly.

Each time I revisit this I find myself thinking that this feature perhaps deserves a more specific name than "sum types" to help make it clearer that this is just an application of the theoretical idea of sum types to some specific situations, and not something that is intended to cover all possible use-cases for sum types. I still quite like "sealed interfaces" because it seems more clearly a special kind of interface and so inherits most of what we're already accustomed to with interfaces (including nils and method sets) and focuses only on constraining the full set of implementers at the declaration site.

Answer 62 · 2023-01-10T16:26:00.000Z

Although I don't think it's a deal breaker, I think it is notable that declaring all of the implementers inside the interface block means that the package which defines the interface must import any packages which export types that will be included in the set.

This means that the package that exports a type set member would be unable to import the package containing the interface and so could not name the interface type to use it in its own code without creating an import cycle.

Most of the use-cases we discussed above aren't impacted by that problem so I think this proposal is still useful despite it, but I do think it's interesting to think about given that it seems to invert the usual way that interface implementation works, where it's the package that defines the implementer that is responsible for (implicitly) adding it to the type set of the interface.

Answer 63 · 2023-01-10T16:34:36.000Z

Although I don't think it's a deal breaker, I think it is notable that declaring all of the implementers inside the interface block means that the package which defines the interface must import any packages which export types that will be included in the set.

That's actually one of the points of the proposal. The idea is to create something analogous to C's unions or Rust's enums. For example, consider the case of scanning a stream of tokens. It makes sense to have predefined types for the various token types, such as

type Token interface {
  Number | String | Operator
}

type Number struct { /* ... */ }
type String struct { /* ... */ }
type Operator struct { /* ... */ }

// Not a great API, but it demonstrates the idea.
func Parse(r io.Reader) ([]Token, error) {
  // ...
}

There are a surprising number of situations where a value can be limited to one of a handful of possibilities that are all known in advance. This proposal is designed to improve the ergonomics around those situations. The current way that something like the above is usually handled is to define the token type as type Token any, but this is error prone because the types are less discoverable and it loses potential features that are only possible if the compiler is told what all the possibilities are, such as the potential ability to remove boxing mentioned in the proposal itself, linter enforcement of exhaustive type switches, and so on.

Answer 64 · 2023-01-10T16:43:05.000Z

@Merovius

I believe the need to explicitly list methods in interfaces containing union elements is an implementation restriction by the current Go compiler and should be lifted sooner or later. I don't see a good reason why it can't (though sometimes these things are surprisingly subtle - there are other implementation restrictions which I don't think can be lifted, or am at least skeptical about).

Regardless of whether it can be lifted, I think it should not be.

If you have A | B | C and those types happen to all have an M method, you can never add a type D without a method M to the union—even if M is irrelevant to the purpose and use of the union—because that would remove M from the union's method set.

To be more concrete, I imagine this would happen quite (most?) often when M = String() string.

Answer 65 · 2023-01-10T16:45:35.000Z

I think whether switch is exhaustive is an entirely independent proposal to sum types. It can be proposed independent of any type changes, and it doesn't need to be attached to this proposal.

It can't be added later, because it would break by then existing programs. A decision would have to be made together with this proposal.

Answer 66 · 2023-01-10T17:37:01.000Z

@jimmyfrasche You already can not add types to an exported union, without breaking compatibility, regardless of what methods the types involved have. I don't think allowing to call methods would change anything.

Answer 67 · 2023-01-10T17:42:04.000Z

FWIW "adding members to a union" is similar in effect on their type set to "removing a method from an interface" and "removing a member from a union" is similar to "adding a method to an interface". So, unless you are the only user of a union, you really can't do anything about it.

In fact, that's kind of why people want unions. They want a closed set of types. If you could change that set, it would no longer be closed.

Answer 68 · 2023-01-10T18:02:48.000Z

In fact, that's kind of why people want unions. They want a closed set of types. If you could change that set, it would no longer be closed.

It's also why I don't like it. Interfaces are a tool to grant freedom, to empower users to provide their own types by implementing them, even if the original author of a function didn't think of them. They mean to open the world for extension, not to close it off.

Answer 69 · 2023-01-10T18:25:54.000Z

Yupp. FWIW in #19412 I brought up the inability to ever modify a union as an argument against their inclusion into a language that is - at least in part - deliberately designed to allow for gradual evolution of APIs a bunch of times. I'm not sure I still totally buy it, as most type system features kind of work that way in one way or another.

But it is something that might be a bit easier with a first-class union/sum type, as you wouldn't run into this aspect of constraints already having variance. For example, if you required a type-switch over a union to always have a default case and made it impossible to assign them to other unions (even if they are subsets of each other), I think you could then add new cases to them backwards compatibly. So, at least in part, it's a point against this specific implementation (overloading union elements for constraints).

Answer 70 · 2023-01-10T18:40:49.000Z

@Merovius the situation is kind of different in that even if you are doing a v(N+1) you can't add it unless you can add the method (not an option if you want to add a primitive type) or be sure no one relied on the existence of the accidental method (and it's not even obvious that you need to look for this since it sneaks in implicitly). Unduly brittle for little gain when it'd make much more sense to be explicit.

Answer 71 · 2023-01-10T18:52:02.000Z

I don't think I understand. Why wouldn't you be able to do a v(N+1)? And why is "being sure no one relied on the existence of the accidental method" any harder than "being sure no one ever used your interface as a constraint and then called a differently constrained function with it"?

(In any case, this is probably off-topic; this proposal is not about allowing to call methods not explicitly mentioned in an interface with unions)

Answer 72 · 2023-01-10T19:28:22.000Z

Returning to the go/ast.Node example for a second, an advantage of a union type there would have been to consolidate the definition to one location. type Node interface { *AssignStmt | *BadDecl | ... }. Currently what implements an ast.Node is spread out a bit over a pretty big file (and is kept readable via discipline in how things are written). Only needing to consult one place in the code would have helped me read/use this and similar libraries in the past. This can be addressed by tooling so it is an overwhelming advantage. But overall I think this proposal would help with the readability of some packages.

I suspect we will not be putting 56 cases (# of ast.Node impls in ast) separated by '|' on the same line. So I would anticipate there will be a lot of trailing '|' for larger cases.

type Node interface {
    *AssignStmt |
      // more *Stmts
      *ArrayType |
      // more *Types
      ...
}

Still readable enough IMO, but worth taking into account.

(Not suggesting Node change from its current meaning of range of token.Pos. The token range definition has other existing uses and is a good example of where not to use a closed type set.)

Answer 73 · 2023-01-10T19:56:02.000Z

Why is this spread out over a big file and not listed in a comment?

Answer 74 · 2023-01-10T20:02:57.000Z

@timothy-king I think it would rather be written as

type Stmt interface {
    *AssignStmt | *BadStmt | *BlockStmt | … | *TypeSwtichStmt
}

type Expr interface {
    *BadExpr | *BinaryExpr | … | *UnaryExpr
}

type Decl interface {
    *BadDecl | *FuncDecl | … | *GenDecl
}

type Node interface {
    Decl | Expr | … | Stmt
}

There's still relatively long unions there, but it gets more manageable (and it might even be possible to break them up further).

Answer 75 · 2023-01-10T20:25:53.000Z

@Merovius I suspect Stmt and Decl would not be exported in this case. But your point that these could be broken up into a union of union types is well taken.

Answer 76 · 2023-01-10T21:31:49.000Z

@apparentlymart

If a nil example were forbidden then it isn't clear what v ought to be here.

Definitely this would require something heavy-handed, I have strong opinions here based only on speculation, but for the sake of speculation - there are places in Go (like having to make maps and chans) where the builtins get special cases, and I’d be interested in going to these lengths to eliminate a ubiquitous ‘nil’. There might be better ideas, disabling the walrus operator for sum types would be brutal and special but seems like one option.

Even if it’s in the machinery that somehow, somewhere a truly invalid instance might panic, I really think the only reasonable response to a ‘nil’ variant of a sum type is panic. Otherwise it’s very tempting to treat such an instance as a zero value of some other included variant, or a predicate to produce a valid value. Then, as a reader of that code, or a writer of code employing an unfamiliar sum type, I simply do not have the ability to immediately observe that ‘nil’ is a properly disjoint case.

Answer 77 · 2023-01-10T21:42:58.000Z

We've gone down the path of some types not having a zero value several times in the past, and it's never worked. Let's not go down that path again. Let's just assume that in Go types must have a zero value. Thanks.

And since the proposal here is for a particular kind of interface type, and since the zero value for all interface types is nil, that is what this proposal says also. We can certainly discuss a sum type that has a different zero value (there is a lot of discussion over at #19412). But it would be very strange to say that for some interface types the zero value is nil and for some other interface types the zero value is something else. That is a level of complexity that I don't think we are going to add to the language.

Answer 78 · 2023-01-10T21:45:27.000Z

@timothy-king Note that both Stmt and Decl are already existing and exported interface types in the ast package.

Answer 79 · 2023-01-10T21:50:10.000Z

@apparentlymart @AndrewHarrisSPU
One idea would be to simply disallow non-nilable unions as channel or map Types (and elsewhere)

type Example interface{ 
    int | string
}
type NilableExample interface{ 
    int | string | nil
}

m:=make(map[string]Example) //compilation error: nil(type?) is not in the type set of Example

m:=make(map[string]NilableExample) // OK

e, ok:= m["something"]
//... 
v, ok:=e.(Example) // regular type assertion to check that it's a legit Example and not nil.

Of course, the zero value of Example would still be nil. That doesn't change.
But such a value would only be created by variable declaration and not assignable where a non-nilable Example is expected.

So would come down to having to be explicit about nil.

Answer 80 · 2023-01-10T21:55:57.000Z

One idea would be to simply disallow non-nilable unions as channel or map Types (and elsewhere)

Or slice types. Or fields. Or interface values.

The language doesn't like types without zero values. That can't really be helped.

Answer 81 · 2023-01-10T22:00:30.000Z

Not necessarily a big issue for slice types either, or fields. One could still define the field case with the explicitly nilable supertype.

~~Some operations have to be disabled or modified otherwise, for slices of non-zeroable union values, that's true:~~

~~make would require a 0 length~~
~~slices of union types which don't allow nil assignment would deal with clear differently or disallow it~~

(edit:
I don't even think that it's important to be able to do that.

If a variable of type T is not zeroable (doesn't mean T doesn't have a zero value btw, just that it cannot be assigned the zero value for that type although var v T is the zero) , one can simply define a slice of {T | nil} which is explicit.

Because the current semantics of slices demand that each indexed slot can be empty or emptied.
Essentially, there are a few things that require optionality/ability to assign zero (to denote the lack of value) but that is easily built)

It's merely switching from nilable by default to nilable by construction which should be safer.

I don't think it would be much of a problem a priori. That's workable afaict. It's more an issue of proper value initialization, i.e. assignability.

For interfaces what do you have in mind as an issue?

Edit: I was randomly browsing and came across a similar treatment in Dart https://dart.dev/null-safety/understanding-null-safety
So it is possible. I still believe this would be more workable for Go unions since it reuses traditional mechanisms such as type assertions.
If subtyping was made more prevalent one day, that could be even further improved but it's not a necessity.

Answer 82 · 2023-01-10T22:06:06.000Z

FWIW the idea of non-nilable interfaces has exactly the sample problems as the periodically happening discussion of non-nilable pointers. It's the same problem. It's not going to happen.

Answer 83 · 2023-01-10T22:08:15.000Z

I'm not sure of what you mean.
I'm striclty talking about unions.

Basic interfaces would remain the same.
And I don't know what a non nilable pointer is.

Answer 84 · 2023-01-11T07:25:31.000Z

Edit: I was randomly browsing and came across a similar treatment in Dart https://dart.dev/null-safety/understanding-null-safety. So it is possible.

Not to point out the obvious, but Dart is not Go. This is about how other aspects of the design of Go assume that every type has a zero value. Obviously, a language that is not designed under this assumption doesn't have this problem and there are many languages without nil.

Answer 85 · 2023-01-11T07:31:03.000Z

Well I'm well aware obviously... , if you browse through it, there are a few sections that might be of interest such as the fact that they had to deal with initialization of variables. (to implement null safety after the fact!)

So appears that some issues were still shared and they've found a way.

Just saying that it's possible, not that it will be done but before we shoot the idea down for unions, might be interesting to explore it.

Answer 86 · 2023-01-11T16:19:05.000Z

I used a map element as an example earlier but note that even a type assertion -- an operation specifically for interfaces, and so would be weird to ban here -- relies on zero value for the non-matching case:

v, ok := interfaceVal.(Type)

Although the type in a type assertion can be a non-interface type, an interface type is also valid in that position and is a common pattern for detecting if the dynamic type in the interface value also implements a more specific interface.

In that situation if the test fails then v is the zero value of the given interface type, which is always nil in today's Go.

This is just one more of many places where Go assumes there is a zero value of every type. I don't think it's feasible to simply ban a particular type from any situation where a zero value is required, because that assumption is all over the language.

Answer 87 · 2023-01-11T16:46:25.000Z

To be accurate, the zero value should still exist and would still be nil.

The issue is rather definite assignment analysis.
It is sensitive to branching.

As long as this kind of analysis can be made fast, in a modular fashion, and without false negatives/positives, it should probably be fine.

I think Go might be one language for which it might be possible. (there are others, historically). I don't know if the ssa backend might not be of help here.

That's something to study.

Answer 88 · 2023-01-11T17:43:09.000Z

@atdiar As soon as it is possible to create a zero value, it is literally impossible to guarantee that it's not getting assigned. That is, if I can create a nil T, it is impossible for a compiler to prove that any given T is not nil. Moreover, you've been told multiple times, by multiple people now, that this is infeasible and not going to happen. At this point, saying that there is "something to study" is pretty frustrating. There just is not. Take a "no".

Answer 89 · 2023-01-11T18:27:06.000Z

@atdiar

As long as this kind of analysis can be made fast, in a modular fashion, and without false negatives/positives, it should probably be fine.

Take this function

func AddNElems[T any, S ~[]T](slc S, n int) S {
	return append(slc, make([]T, n)...)
}

Can we prove that this function take only slices of non-nilable elements?

Answer 90 · 2023-01-11T18:58:08.000Z

@DmitriyMV
The slice type constructor implicitly requires that T is nilable/zeroable (because a slice can be cleared) . I don't see why such a check would be infeasible.

@Merovius
What makes you think that I have to be compelled to agree with you?
You don't even understand what I am saying.
I'm not even claiming that a variable shouldn't be assigned. I'm saying that a variable shouldn't be assigned the zero value in certain cases.
Before claiming that something is impossible, one might want to study. This is not the first time that you make such claims and exhibit an attitude that is slightly disrespectful. I hope it will be the last time.

Answer 91 · 2023-01-11T19:13:13.000Z

@atdiar

The slice type constructor implicitly requires that T is nilable/zeroable (because a slice can be cleared) . I don't see why such a check would be infeasible.

So, you are saying AddNElems where T is any will not work with "non-nilable interfaces" - is that correct?

Answer 92 · 2023-01-11T19:19:06.000Z

Where T is constrained by any? Yes it won't work. (in your example, because T also is declared as a slice type)
If you have a non-nilable union type U, U wouldn't be usable in a slice. However interface{U | nil} should be usable as a type argument.

Answer 93 · 2023-01-11T19:31:15.000Z

Just a reminder to keep the discussion here respectful. You know who you are. Thanks.

Answer 94 · 2023-01-11T19:37:42.000Z

@atdiar

So, to be clear, you essentially want a separate class of types (meta types?) similar to existing, but with additional and very specific restrictions to available operations on this "class" of types.

I mean, the whole point of any is that it can satisfy any type (and by extent, being type constraint, works for any type). With what you are proposing, Go type system will be divided not only by two "type hierarchies" but also two "type constraint hierarchies". Essentially, this means we will have two entirely separate type systems which are not interchangeable and have very different low level semantics. Which, in turn. essentially means you will have a two different languages in one - first one assumes that zero state is valid (starting from slices and ending with "reflect") and the second one demands initialization and disallows the set of operations like make([]T, n) and such.

The complexity of implementing this, in the end, equals to creating a new language, so the question becomes - why bother with adding this to Go? What I'm trying to say, is that no matter how we don't like "zero value" in specific situations, it is one of the fundamentals the language is built on. We can adjust sum types to work with it, but we cannot break it or make a parallel mechanism for sum types - this will either be a fundamentally breaking change (the road Dart 2.0 took BTW) or unsound type system. I don't think we want either.

Answer 95 · 2023-01-11T19:38:26.000Z

This is off-topic now, please create another proposal if you have ideas on adding non-zero variables to the language. Thanks.

Answer 96 · 2023-01-11T22:03:01.000Z

Understood. I'll create another prospective issue. To be clear one more time, this is still not "adding non-zero variables" , it is about definite (un)assignment.
Zero values are fine, but the runtime.Type of the zero value is not necessarily in the type set which is what I'm trying to address. Doesn't seem that off-topic to me but fair enough.
Cheers.

Answer 97 · 2023-01-16T09:41:00.000Z

I think one of the first things some Go developers would try to do with this feature is implement a generic Option type. Here's a brief exploration of that idea.

My first thought was that it would look like this:

type None struct{}
type Option[T any] interface{None | T}

But the first problem here is that a type parameter can't be used in a type list: the definition above results in a MisplacedTypeParam error.

The second problem is that, since an interface value can be nil, a variable of type Option[T] could either nil, or None, or a value of type T. So it would be a sort of optional optional.

In fact to define an Optional int64 type this would be sufficient:

type OptionalInt64 interface{int64}

This would actually have many of the properties I would be looking for in an option type:

The zero value is, sensibly, nil ("None")
The int64 value would be stored directly (avoiding memory allocations)

I have sometimes used sql.Null* types such as sql.NullInt64 as generic optional types (even, sadly, in code that doesn't otherwise deal with sql) because they have the above advantages. In fact if this proposal was implemented with values stored directly, then OptionalInt64 would be stored as the equivalent of sql.NullInt64.

But OptionalInt64 would add some type safety: you couldn't treat an OptionalInt64 as an int64 without first guaranteeing it was not nil (right?).

A standard generic Option type was discussed in #48702. From what I can see the proposal was rejected because of unanswered questions, and not necessarily because a standard optional type was not desirable.

Answer 98 · 2023-01-16T13:24:55.000Z

Difficult question, but I think it may contribute to the “costs” of this proposal: How serious do you estimate the danger that people start using sum types instead of error types as return types? Would this be even feasible?

Answer 99 · 2023-01-16T14:13:12.000Z

@bronger I don't think this proposal is particularly useful for that - and insofar as it is, you could get basically the same effect without it.

One thing people want from a Result[T] type is that it can either be an error or a T. But with this proposal, there's always a third option: It could be nil. So, right off the bat this proposal gives weaker guarantees than what people really want.

Then you'd have to jump through a couple of hoops to make such a Result type actually safe. There are some obstacles:

T could implement error
Neither T nor error can be union terms (the former because it's a type parameter, the latter because it has methods)
You can only destructure a Result[T] using type-assertions, so the cases must be disjoint, to be type safe

So you'd get something like

type Error struct { E error }
type Success[T any] struct { Val T }
type Result[T any] interface { Error | Success[T] }

There's a lot of overhead in using this, over returning (T, error) - not just because you need to type-assert, but you also need to wrap and unwrap the individual structs.

And the value you get is saying that a Result[T] is either an Error or Succes[T], fair enough. But what does that get you, except documentation? The compiler won't actually type-check that you correctly type-switch on it exhaustively. So, how is this really any better than just returning an any and documenting that it's either an Error or a Success (or construct a different non-union interface for wrapping)?

I think the value of this proposal can only really come in enforcing constraints on inputs, really. There is some value in knowing that you aren't being given anything but one of these N things. The information that a function only returns a finite set of types isn't super useful without further infrastructure (like match statements and exhaustiveness checks and the like).

Answer 100 · 2023-01-16T14:26:24.000Z

One advantage, to me, is if it makes it easier to plug things that return Result[T] into things that accept Result[T] (or, possibly, ...Result[T]) than it is with (T, error). Same for Option[T] vs (T, bool).