python/mypy

Document why having a union return type is often a problem

JukkaL opened this issue · 22 comments

Union return types get suggested pretty often, but they usually are not the right thing for library stubs.

The key reason is that every caller will have to use isinstance() to pick the union apart before its value can be used, right? And that's often unnecessary because there's some reason why the runtime value is always one or the other. Example, if pow() returned Union[int, float], then we couldn't use pow(2, 2) as a sequence index.

@gvanrossum

Example, if pow() returned Union[int, float], then we couldn't use pow(2, 2) as a sequence index.

I think it's an implementation detail of how the type checker works and what it considers a type error. I've worked with Python type hinting for quite a while (love it by the way), though using PyCharm rather than mypy (but am looking forward to trying out mypy soon).

This is how PyCharm 2016.4 handles the following Python 3.5 code:

from typing import Union

x = 1  # type: Union[int, float]


def f(y: int) -> None:
    pass

f(x)  # No type error, as 'Union[int, float]' is allowed to be input into 'int' parameter

# Test to show that bare 'float' will raise error.
f(0.5)  # Warning raised: Expected type 'int', got 'float' instead

PyCharm seems to prefer avoiding false positive type errors, which I suppose is more of a design choice. I bring this up because they recently fixed a type hinting issue by switching the inferred type hint to a Union.

I also heard PyCharm is planning to switch to using typeshed in a future release. Should type checkers other than mypy be considered when it impacts the typeshed project?

I don't know the internal plans for PyCharm, only a user, so I'll let @vlasovskikh speak to those details and correct me if needed.

typeshed is not specific to any tool. As PEP 484 doesn't spell out all details of how type checking should work, tools may make different interpretations and thus different annotations sometimes make sense for different tools. Currently mypy probably (?) is the biggest production user of typeshed stubs, so the stubs may have a tendency of being optimized towards mypy.

PEP 484 doesn't define the precise semantics of union types. Mypy does "strict" union checking which is similar to how union types behave in type theory: an operation is okay for a union type if it is valid for all items of the union. It seems that PyCharm uses "unsafe" (not established terminology) union type checking where an operation is okay if it is valid for some item of an union. @vlasovskikh please let me know if I understood this correctly.

The main motivation for strict unions in mypy is finding None-related type errors, and in particular checking against forgetting to check if an Optional[x] value is None. This is a common source of bugs in Python and other languages.

For a checker that does strict unions Any is the probably best return type annotation for a function whose return type depends on the value (not type) of an argument, since otherwise an isinstance check would be needed after many or most calls. For a checker that uses unsafe unions, a union return type is better. The latter allows the tool to perform some (even if not full) checking without changing existing code.

Here are ideas for resolving this general issue:

  • Leave return types Any as it will work with both tools, though with somewhat limited level of checking for PyCharm. Optionally, tools can special case some of these functions internally and infer a more precise type than provided by a stub. The stubs would generally use least common denominator functionality for things not specified in PEP 484.
  • Support tool-specific conditional definitions in stubs. For example, have something like if typing.TOOL == 'pycharm': ... so that PyCharm can use a more precise return type. This would likely require a PEP 484 update.
  • Update PEP 484 or typeshed documentation to specify how union types should be used, and just follow the convention in the stubs.

@JukkaL Thanks for the detailed explanation. Just wanted to highlight, as you've said, the different interpretations being used so that the community as a whole can continue to work together on this.

It seems that PyCharm uses "unsafe" (not established terminology) union type

While PyCharm's handing of Union's is less safe than mypy, I think it's important to remember that using Any for these return types brings with it other issues. So it's more of a trade off situation:

def fun(y: str) -> None:
    y.capitalize()
    ...

x = pow(2, 2)
fun(x)  # type error missed because pow annotated as returning Any

I think another reason I can imagine why PyCharm would favor Union's over Any is that PyCharm is an IDE, and having intelligent autocomplete is super helpful. Can't do that with Any.

If mypy deals better with Any returns instead of Union returns, would it be possible to have mypy interpret any -> Union[...] in typeshed as -> Any?

(Btw. pytype does "unsafe" Unions as well, for most operations)

@johnthagen Yes, using an Any return type is clearly a compromise. I also agree that for code completion it makes sense to try to infer a type even if it's not precise. Precise types are important for type checking, and strict checking of unions (in particular, Optional[x]) has been one of the most commonly requested mypy features.

@matthiaskramm I don't like the idea of ignoring some type annotations, as this could be very confusing for users. Also, often a union return type will be fine for mypy. For example, the types in the union may have some common functionality, and sometimes the caller is expected to do a check before using the value.

We are planning to teach mypy to special case the signatures of some functions (#1240), and then there would be more flexibility. It would be less of a usability problem for mypy to infer a more precise type for a return type than the one declared. For example, if pow is called integer arguments, and with a non-negative literal second argument, we could infer the return type to be int instead of float.

Also, PEP 484 suggests that Union[int, float] is redundant and can be replaced with just float, since int is valid when a float is expected (though the wording in the PEP is a little vague).

@JukkaL @johnthagen In PyCharm we switched from "strict" to "weak" unions a few years ago after some experiments with None checks. We infer unions as return types of functions. Strict inferred unions confused our users a lot since they required isinstsance checks. In order to be compliant with PEP 484 and Mypy we will go back to "strict" unions for type hints and "weak" unions for inferred types. In PyCharm we need "weak" unions instead of Any because we provide code completion based on this data.

Could mypy add a configuration flag to chose between strict and lenient union matching? A configuration flag seems like a better solution than to ask mypy users to reduce return type specificity by replacing Union with Any.

Is this really because Union return types (other than possibly the degenerate case of Optional) are a bad idea design-wise?

By which I mean that the return type should usually be inferrrable from the types (or Literal values) of the inputs?

For example, a function that upcases either strings or the ascii-range letters of a bytes object could accept and return and Union:

def upcase(s: Union[str, bytes]) -> Union[str, bytes]: ...

...but the better thing to do is use an overload...

@overload
def upcase(s: str) -> str: ...
@overload
def upcase(s: bytes) -> bytes: ...

def upcase(s): 
    if isinstance(s, str): 
        return s.upper() 
    elif isinstance(s, bytes): 
        return bytes(x - 0x20 if 0x61 <= x <= 0x7A else x for x in s)
    else: 
        raise TypeError("need str or bytes")

If this is the kinda thing we're getting at, I'd be glad to document this.

Hi jriddy. Apologies, but does python provide an @overload builtin? If so that would be fascinating, but glancing around I found a deferred PEP and PyPI project but as of Python 3.7 I'm not finding that decorator...

atagar@morrigan:~$ python3.7
>>> @overload
... def echo(msg: str) -> None:
...   print('str echo: %s' % msg)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'overload' is not defined

Ah! Thank you JelleZijlstra, that's perfect. Very much agreed, making users that receive this error aware of typing.override should resolve this.

Thanks for the help!

Aren't constrained typevars also an option? e.g.

@overload
def upcase(s: str) -> str: ...
@overload
def upcase(s: bytes) -> bytes: ...

could also be written as

S = TypeVar('S', bytes, str)
def upcase(s: S) -> S

and that would express its exact behaviour.

And S already exists and is called AnyStr.

@xmo-odoo usually this comes up in more complicated situations that can't be handled with overloads. For example, the return value of pow() depends on the sign of the arguments, and the type system doesn't know whether a number is negative or positive.

usually this comes up in more complicated situations that can't be handled with overloads.

Certainly, I should have quoted the comment about overloads which I wanted to reply to, sorry about that.

For example, the return value of pow() depends on the sign of the arguments, and the type system doesn't know whether a number is negative or positive.

Yes I guess overloads could work with eventual range / refinement types whereas typevars wouldn't really help as well.

I just saw a lot of PRs and discussions so I'm unsure if this issue is resolved. Can someone confirm ??

I don't think this has been fixed. Skimming the mypy docs I do not see a section about this, so I think adding to the docs is still a good idea.

What is the recommendation to handle cases like this?

def my_sort(items: Sequence[int], also_return_indices: bool) -> Union[List[int], Tuple[List[int], List[int]]]
    sorted_items, indices = ... # bunch of code
    if also_return_indices:
        return sorted_items, indices
    return sorted_items

i.e. I cannot use @overload.

Use overload + Literal, should work for most cases.

(Otherwise your fallback is the same as everyone else's, use Any)

Use overload + Literal, should work for most cases.

(Otherwise your fallback is the same as everyone else's, use Any)

Do you have an example or link to docs? Are you saying I need to change also_return_indices to a Literal['true', 'false']?

It’s frustrating that annotating the return values of a function as -> bool | int (which is equivalent to -> Union[bool, int]) gets you a mypy error as if the expression has the int return type while the annotated variable has the bool return type (but the return type of the function is actually True, so a bool type). I have annotated the return type as being either a bool or an int, so what the ****?!

I know this is because the bool type is a subtype of the int type. But can’t mypy distinguish between those two types?

Annotating the return type of my function as only -> int (just to make mypy happy) is wrong as it can also return a bool, so True or False, and that would confuse my fellow Python developers.

Mishaps like this is what make me hate annotations still.