microsoft/TypeScript

Support open-ended unions

andy-ms opened this issue ยท 17 comments

Suggestion

Ability to discriminate between union members when not all union members are known.

Use Cases

  • redux actions: See #2214 (comment)
  • This would enable us to discriminate on Node kinds without converting Node to a huge union (which slows down compilation too much). (Also, Node is effectively open-ended because we add new kinds fairly often.)

Examples

interface Shape {
    unique kind: string;
}

interface Square extends Shape {
    kind: "square";
    size: number;
}

interface Circle extends Shape {
    kind: "circle";
    radius: number;
}

// other shapes may exist

function area(s: Shape) {
    switch (s.kind) {
        case "square":
            return s.size * s.size;
        case "circle":
            return Math.PI * s.radius ** 2;
        default:
            return 0; // Or hand off to some other function that handles other Shape kinds
    }
}

Checklist

My suggestion meets these guidelines:

  • This wouldn't be a breaking change in existing TypeScript / JavaScript code
  • This wouldn't change the runtime behavior of existing JavaScript code
  • This could be implemented without emitting different JS based on the types of the expressions
  • This isn't a runtime feature (e.g. new expression-level syntax)

Workarounds

Cast the general type to a union of known type.

function area(sIn: Shape) {
    const s = sIn as Square | Circle;
    ...

Open questions here:

  • What (if anything) prevents the declaration of two things with the same kind ?
  • How does this work in a "pull" typechecking model? To know what to do in the switch, we'd have to exhaustively check the entire program to find all possible declarations
  • Where do the perf wins come from? Isn't the type of s in the default block (if we imagine hundreds of other shapes) still an enormous union? Or does an open-ended union imply the existence of subtraction types?

I think area should be:

function area(s: Square | Circle | Shape) {...}

In Redux I specify which actions my reducer can handle. But I also end it with an | Action to denote that any other Action can be passed in, but it will basically be ignored. But since type is string and not a literal, the narrowing doesn't work under the specific cases. If I use Action<""> instead the narrowing works, but then I can't pass in unsupported actions. In my tests to cover unsupported actions I typically pass {type:""} just to prove nothing changes.

orta commented

Related somewhat: #33471

Here's a minimal repro for an issue in flow-sensitivity for open unions:

const x = null as any as {
    discriminator: string;
} | {
    discriminator: 'abc';
    extra: string;
};
if (x.discriminator === 'abc') {
    console.log(x.extra) // this should not be an type error
} 

Playground: https://www.typescriptlang.org/play/#code/MYewdgzgLgBAHjAvDMBXANumBDCOwCeOeA3gFAyUwAmAlhMAE60C2tY2UIjAXDNMzABzANxkAvjAA+MclRr0mrdp258A5NgBGwdWPkBTOFEbY+A9qIljaAMxgAKOADo6DZmw5dGSRMk066gCUshRUoJAg6AbO6CBCTs5GJthBEjBAA

@brandonbloom that's a request for negated types IMO. The type there doesn't disallow { discriminator: "abc" } as a value for x

@RyanCavanaugh I've done some digging in to checker.ts and I think there may be a much simpler fix to my specific case. In the narrowTypeByDiscriminant function, there is a call to filterType which reduces to all the cases that could match, but when some cases are more-specific than others, the redundant cases are not filtered out. Seems like adding a second filter pass there would fix it.

Upon thinking about this further, I think I understand what you're saying now. You want the discriminator to be "string but not 'abc'".

Something like this seems to come up quite a lot with string literals. We'll often get a request to support N well-known strings, but in reality, the system allows arbitrary strings. Usually the intent there is that tooling should give auto-complete hints, but the type-checker should not error.

This seems to be the case in #42134.

Additionally, I've spoken to @bterlson and @jonathandturner about this a couple of times now on the Azure SDK. The work-arounds are

  • Using keyof { well: any, known: any, fields: any, [x: string]: any } where tooling picks up on well-known properties from keyof for completions, but the effective type is just string. (ugh)
  • string | SomeEnumType (less ugh), or just string with some documented enum to users can pass through.
tibbe commented

Can someone explain why this doesn't work today:

type Circle = {
    kind: 'circle';
    radius: number;
};

type Rectangle = {
    kind: 'rectangle';
    height: number;
    width: number;
};

type UnknownShape = {
    kind: Omit<string, 'circle' | 'rectangle'>;
};

type Shape = Circle | Rectangle | UnknownShape;

const f = (s: Shape) => {
    switch (s.kind) {
        case 'circle':
            console.log('circle:', s.radius);
            break;
        case 'rectangle':
            console.log('rectangle:', s.height, s.width);
            break;
        default:
            console.log('unknown:', s);
            break;
    }
};

f({kind: 'circle', radius: 2.0});

It seems to me that the type checker ought to be able to prove which shape we have in each case, as the kinds of the three different types are mutually exclusive.

@tibbe Omit<> is used to remove properties from a type. You try to remove the properties circle and rectangle from the type string, which makes no sense. You probably meant to use Exclude<>, which is used to remove types from a union. However, this won't work either, as the type string does not include these two string literal types.

It's not possible in TypeScript to describe the type "any string, except these two", so it's not possible to provide the type UnknownShape. You should simply leave it out all together, it serves no purpose.

tibbe commented

@MartinJohns thanks for the explanation.

The context here is parsing (in this particular case using the io-ts library). We need to be able to parse a JSON value into a series of known shapes, based on the kind field in the JSON, but also handle the case of an unknown shape (which can happen e.g. due to version skew between the backend and the frontend).

So we need to be able to create a union that is distinguished based on one field and have the last union member have an "unknown but distinct from the rest" value for the discriminator field.

A filed a bug for the concrete parsing problem against io-ts. It should give more context: gcanti/io-ts#665

@tibbe This would require "negated types", which is unlikely to come any time soon (aka the next years): #29317

How I would deal with this: Still leave the UnknownShape out. It serves no purpose. When dealing with the data you narrow based on the kind field, and when it resolves to never you know you're dealing with an unknown shape.

#57943 got me to look at this issue again. I want to brain-dump some ideas before I forget about them, and maybe others can build on them or refine them.

  • We could introduce a new type intrinsic called Unhandled<T>.
  • To indicate that a developer should consider checks against a specific type, they might write T | Unhandled<A | B | C>.
  • When you have an Unhandled in a union, it does not subtype reduce.
  • The checker could define a set of constructs where Unhandled values are witnessed and must be checked semi-exhaustively.
  • Tooling could use `Unhandled<"foo" | "bar"> to provide completions for strings (or other literals).
  • Other Unhandled<T> | Unhandled<U> reduced to Unhandled<T | U> where T | U is not subtype-reduced.
  • To remove all Unhandleds, we could also provide a Handled<T> utility type. The resulting type would be subtype reduced.

Here's some immediate problems or downsides I would call out with this:

  1. Naming the utility "Unhandled" doesn't really communicate the idea of "this accepts all strings, but you might want auto-complete for this common set". Also, I could totally imagine people getting confused over Handled vs Unhandled.
  2. Knowing exactly what it means to handle a value is not exactly a universal concept. You can argue that a switch should exhaustively check all of Unhandled values in some way; but maybe you only care about 3 values and know the rest should be explicitly ignored. How do you express that to the type system?
  3. On the same note, if a library author says that a type is Unhandled, how do you opt out of caring? In the type system, there'd be Handled<T> to remove all Unhandled<T>s. But is it awkward to write x as Handled`?
  4. Adding another kind of marker type is complex because you have to thread it through to be ignored or specially handled everywhere.
  5. If you have a parameter declared as x: Shape | Unhandled<Circle | Square>, what is the type of x.kind? Is it string | Unhandled<"circle" | "square">? It feels like yes?

@DanielRosenwasser I like where you're going with Unhandled, though I also find the naming a little confusing.

My bigger concern is about coupling together, in one construct, the documentation/autocomplete portion of this feature with the exhaustiveness checking requirement.

On one hand, it would be really great if a type could require that its cases be exhaustively handled everywhere that type is used (unless a particular use site opts-out, like with a cast to Handled). Right now, if I have a switch that's intended to be exhaustive, I can easily opt-in to exhaustiveness checking for that switch (with default: assertUnreachable(...)); but, if I forget to add that default case to any of my switch statements that should've been exhaustive, TS won't help me at all.

On the other hand:

  • sometimes, one would want the documentation portion of this feature without the exhaustive check. #57943 is one such case: if a function's ErrType were defined as unknown | Unhandled<...>, it'd be important for Unhandled not to come with an exhaustiveness checking requirement, so that new errors could be added to an existing library's declarations without that being a breaking change.
  • conversely, sometimes one would want exhaustiveness checking portion of this feature without the need for an open-ended union. E.g., I might have some closed "a" | "b" | "c" string literal union that I want to require code to always handle exhaustively (absent an opt-out for a particular switch statement).

--

To address the naming issue, my proposal would be an intrinsic type like:

type WithKnownCases<KnownCases extends BaseType, BaseType> = intrinsic;
  • WithKnownCases<'circle' | 'rectangle', string> would be treated just like string for assignability, but show 'circle' or 'rectangle' in autocomplete.
  • As cases are handled, the KnownCases would narrow.
  • WithKnownCases<A, X> | WithKnownCases<B, Y> could reduce to WithKnownCases<A | B, X | Y>

I think this would be a useful building block โ€” it would solve a lot of use cases in this issue.

However, the utility is still limited a bit without subtraction types. Consider:

type Circle = { kind: 'circle'; /* ... */ };

type Rectangle = { kind: 'rectangle'; /* ... */ };

type UnknownShape = { kind: string; };

type Shape = WithKnownCases<Circle | Rectangle, UnknownShape>;

function f(s: Shape) {
    switch (s.kind) {
        case 'circle':
            // What is `s`'s type here? 
            // Narrowing to `Circle` is probably what people want but, without being able to
            // use subtraction types to define `UnknownShape['kind']`, that's unsound.
            // So, presumably, `s` is instead `WithKnownCases<Circle, UnknownShape>`.
            // That's better than nothing, I guess.
            break;
        case 'rectangle':
            // ...
            break;
        default:
            break;
    }
};

Then, for opt-in exhaustiveness checking, analogous to what TS has today, there could be an intrinsic type:

type KnownCasesOf<T> = intrinsic; // KnownCasesOf<WithKnownCases<T, ...>> = T

Then...

function f(s: Shape) {
  switch (s.kind) {
    case 'circle':
    case 'rectangle':
      break;
    default:
      // `s` here is `WithKnownCases<never, UnknownShape>`
      assertUnreachable(s as KnownCasesOf<typeof s> satisfies never)
  }
};

For opt-out exhaustiveness checking, I'd have a separate intrinsic type:

type RequireExhaustiveHandling<T> = intrinsic;

RequireExhaustiveHandling can be used with a simple closed union or an open union. If used with an open union, it only requires exhaustive handling of the known cases.

To opt out, there'd be something like:

type AllowUnhandledCases<T> = intrinsic; // AllowUnhandledCases<RequireExhaustiveHandling<T>> = T

Actually, having RequireExhaustiveHandling would make opt-in exhaustiveness checking much clearer:

function f(s: RequireExhaustiveHandling<Shape>) {
  switch (s.kind) {
    case 'circle':
      break;
    // error here: `s` is narrowed to `RequireExhaustiveHandling<WithKnownCases<Rectangle, UnknownShape>>
    // TS complains that case Rectangle is not handled
  }
};

T would be assignable to RequireExhaustiveHandling<T>, so a function can just declare its argument with RequireExhaustiveHandling to opt-in for that function. This is actually much clearer than the whole assertUnreachable pattern, which isn't intuitive to new typescript users from what I've seen.

Brain dump on some parts of the logic that would have to be worked out here:

For Unhandled (called NoReduce below cuz I find that easier to think about):

  • Behavior under intersection: T & NoReduce<U>
    • Probably distributes: NoReduce<T & U>
  • Unification: e.g., chooseOne<T>(a: T, b: T): T with chooseOne(NoReduce<"a">, string)
    • Follows the behavior of unioning to return string | NoReduce<"a">?
  • When a non-marked type is a subtype of the NoReduce type, e.g., NoReduce<number> | 4 or NoReduce<4> | 4
    • Probably reduces but keeps the marker: NoReduce<number> | 4 => NoReduce<number>; NoReduce<4> | 4 => NoReduce<4>
    • So only strict supertypes are left un-reduced
  • Subtyping/assignability
    • Still trying to have mutual subtypes, e.g.:
      • (it: NoReduce<4>) => NoReduce<4> <=> (it: 4) => 4
      • f2(it: NoReduce<4>): void, f2(4) // legal

With the constrained version (WithKnownCases), the details are largely the same:

  • Under intersection:T & WithKnownCases<U, V> => WithKnownCases<T & U, T & V>
  • Unification: chooseOne(WithKnownCases<"a", string>, string) => WithKnownCases<"a", string> | string, which reduces to WithKnownCases<"a", string> per below
  • WithKnownCases<T, U> | V reduces to...
    • => WithKnownCases<T | V, U> if V is a strict subtype of U
      • WithKnownCases<4 | 5, number> | 6 => WithKnownCases<4 | 5 | 6, number>
    • => WithKnownCases<T, U | V> otherwise
      • WithKnownCases<4 | 5, number> | number => WithKnownCases<4 | 5, number>
  • Subtyping/assignability still trying to have mutual subtypes, e.g.:
    • (it: WithKnownCases<4, number>) => WithKnownCases<4, number> <=> (it: number) => number
    • f2(it: WithKnownCases<4, number>): void, f2(WithKnownCases<2, number>) // legal

I wonder if the idea I proposed here could be the basis for implementing "open-ended unions" more generally, on top of WithKnownCases? Basically, instead of trying to prevent types from reducing:

  • let every type have an optional, associated "documentation type" that's constrained to be a subtype of the type itself;
  • when types reduce/merge, merge their documentation types
  • apply the same narrowings that apply to a variable to its documentation type
  • show both the type and it's documentation type in IDE popups/completions

WithKnownCases<T, U> becomes the way of writing "type T with documentation type U"