HypothesisWorks/hypothesis

Check for `annotated-types` constraints in `st.from_type(Annotated[T, ...])`

Closed this issue ยท 5 comments

At PyCon 2022 I helped start https://github.com/annotated-types/annotated-types - so it would be nice to support these new constraints once we finalize the upstream release! The key idea is that we can iterate through the constraints, and derive callables which we then use as filters on the strategy that we resolved for the annotated type.

Filter-rewriting will then make numeric bounds as efficient as is practically possible at runtime (though see #3134 / b4c4161 / #3479 for strings). For example:

>>> from_type(Annotated[int, Gt(10)])
# integers().filter(partial(gt, 10))
integers(min_value=11)

>>> from_type(Annotated[str, Predicate(str.isdigit)])
text().filter(str.isdigit)
# from_regex(r"\d+", fullmatch=True)

@adriangb @samuelcolvin FYI ๐Ÿ˜

A more detailed breakdown of the tasks involved:

  • Gt, Ge, Lt, Le:
    • translate into filter predicates with partial and the operator module
    • add filter-rewriting support for st.fractions() and st.decimals()
  • MultipleOf:
    • annoyingly difficult to get right, especially for floats! Also ambiguous: modulo-based or division-based semantics?
    • filtering is absurdly inefficient; probably needs special handling plumbed right through; at minimum apply this last
    • can we get away with warning people off for now?
  • Timezone:
    • resolve this to a timezones= strategy for st.times() or st.datetimes()
    • register a callback fn for time and datetime types, which accepts this arg, and plumb it through
    • explicit error if we see this constraint but the resolver func doesn't accept this arg
  • MinLen, MaxLen: as for timezones, but fall back to a warning and filter predicate rather than an error
  • Predicate: use as predicate to .filter() in the obvious way

Other constraints: unpack GroupedMetadata into the component parts, ignore unknown constraint types and log a warning at debug verbosity. The new code will replace logic invoked from:

if is_annotated_type(thing): # pragma: no cover
# This requires Python 3.9+ or the typing_extensions package
annotated_strategy = find_annotated_strategy(thing)
if annotated_strategy is not None:
return annotated_strategy
args = thing.__args__
assert args, "it's impossible to make an annotated type with no args"
annotated_type = args[0]
return st.from_type(annotated_type)

Viicos commented

I've started an implementation of this, and I have a couple questions:

Gt, Ge, Lt, Le:

  • translate into filter predicates with partial and the operator module
  • add filter-rewriting support for st.fractions() and st.decimals()

I'm don't have much knowledge on how filter rewriting works, but is there a way to check that integers().filter(partial(gt, 10)) is correctly rewritten by Hypothesis to integers(min_value=11)? How about integers().filter(partial(gt, 10)).filter(partial(lt, 20))?
Edit: nevermind, seems to work when accessing the wrapped strategy of the lazy strategy.


How should we handle constraints that are incompatible with each other/with the annotated type? e.g. Annotated[int, Gt(1), Timezone(tz.utc)], Annotated[UUID, Gt(1)]? Should we ignore and let a potential error be raised at some point when the resulting strategy is used?

Zac-HD commented

How to check that filter-rewriting works: trust that https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/tests/cover/test_filter_rewriting.py would surface any problems ๐Ÿ˜

How should we handle constraints that are incompatible with each other/with the annotated type? e.g. Annotated[int, Gt(1), Lt(1)], Annotated[str, Gt(1)]? Should we ignore and let a potential error be raised at some point when the resulting strategy is used?

Hmm, this is somewhat tricky - the decision about whether to return st.nothing() or raise an InvalidArgument exception is made on a case-by-case basis. We prefer to raise an error when we think there's something the user could reasonably do to avoid it; or use nothing() when that would be impractical.

I think in this case I'd just translate to filters, as the easy-to-implement approach. That means:

  • Annotated[str, Gt(1)] will raise a TypeError due to the comparison when an example is drawn. It's somewhat unfortunate that the error message won't mention that this came from an annotated type, but I think not worth the code required to improve on it.
  • Annotated[int, Gt(1), Lt(1)] will return nothing(), because that's equivalent to a strategy where all possible values are filtered out. In this case, I think we can add a few lines of code like if s.is_empty: raise InvalidArgument(f"There are no valid values for type {t!r}")
Viicos commented

Ok I see, I'll soon open a draft PR so that progress can be tracked.

  • register a callback fn for time and datetime types, which accepts this arg, and plumb it through

By register a callback you mean using this function?

def register(type_, fallback=None, *, module=typing):
if isinstance(type_, str):
# Use the name of generic types which are not available on all
# versions, and the function just won't be added to the registry


Ideally I will try to build strategies in a "smart" way, and avoid using filters when no filter rewriting is available (e.g. text(min_size=20) instead of text().filter(lambda el: len(el) >= 20); applies to list/bytes/probably others). Even Gt/Ge/Lt/Le can apply to types where filter rewriting could be implemented but currently isn't (e.g. datetime), so I need to build the corresponding strategies in a smart way.

What's challenging is that I could manually build the strategies with the correct kwargs (e.g. min_size=..., max_size=...) for each known type, but that would be rewriting what from_type is currently doing with the extra logic from constraints.

Zac-HD commented

Let's start simple, and only include filters in the initial version. If sometimes that means users get Unsatisfiable rather than data which violates their annotation constraints, I think that's still an improvement!

After we've shipped the logically correct version, we can look at improving performance in common cases.