Check for `annotated-types` constraints in `st.from_type(Annotated[T, ...])`
Closed this issue ยท 5 comments
At PyCon 2022 I helped start https://github.com/annotated-types/annotated-types - so it would be nice to support these new constraints once we finalize the upstream release! The key idea is that we can iterate through the constraints, and derive callables which we then use as filters on the strategy that we resolved for the annotated type.
Filter-rewriting will then make numeric bounds as efficient as is practically possible at runtime (though see #3134 / b4c4161 / #3479 for strings). For example:
>>> from_type(Annotated[int, Gt(10)])
# integers().filter(partial(gt, 10))
integers(min_value=11)
>>> from_type(Annotated[str, Predicate(str.isdigit)])
text().filter(str.isdigit)
# from_regex(r"\d+", fullmatch=True)
@adriangb @samuelcolvin FYI ๐
A more detailed breakdown of the tasks involved:
Gt
,Ge
,Lt
,Le
:- translate into filter predicates with
partial
and theoperator
module - add filter-rewriting support for
st.fractions()
andst.decimals()
- translate into filter predicates with
MultipleOf
:- annoyingly difficult to get right, especially for floats! Also ambiguous: modulo-based or division-based semantics?
- filtering is absurdly inefficient; probably needs special handling plumbed right through; at minimum apply this last
- can we get away with warning people off for now?
Timezone
:- resolve this to a
timezones=
strategy forst.times()
orst.datetimes()
- register a callback fn for time and datetime types, which accepts this arg, and plumb it through
- explicit error if we see this constraint but the resolver func doesn't accept this arg
- resolve this to a
MinLen
,MaxLen
: as for timezones, but fall back to a warning and filter predicate rather than an errorPredicate
: use as predicate to.filter()
in the obvious way
Other constraints: unpack GroupedMetadata
into the component parts, ignore unknown constraint types and log a warning at debug
verbosity. The new code will replace logic invoked from:
hypothesis/hypothesis-python/src/hypothesis/strategies/_internal/types.py
Lines 369 to 377 in 47b35ce
I've started an implementation of this, and I have a couple questions:
Gt
,Ge
,Lt
,Le
:
- translate into filter predicates with
partial
and theoperator
module- add filter-rewriting support for
st.fractions()
andst.decimals()
I'm don't have much knowledge on how filter rewriting works, but is there a way to check that integers().filter(partial(gt, 10))
is correctly rewritten by Hypothesis to integers(min_value=11)
? How about integers().filter(partial(gt, 10)).filter(partial(lt, 20))
?
Edit: nevermind, seems to work when accessing the wrapped strategy of the lazy strategy.
How should we handle constraints that are incompatible with each other/with the annotated type? e.g. Annotated[int, Gt(1), Timezone(tz.utc)]
, Annotated[UUID, Gt(1)]
? Should we ignore and let a potential error be raised at some point when the resulting strategy is used?
How to check that filter-rewriting works: trust that https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/tests/cover/test_filter_rewriting.py would surface any problems ๐
How should we handle constraints that are incompatible with each other/with the annotated type? e.g.
Annotated[int, Gt(1), Lt(1)]
,Annotated[str, Gt(1)]
? Should we ignore and let a potential error be raised at some point when the resulting strategy is used?
Hmm, this is somewhat tricky - the decision about whether to return st.nothing()
or raise an InvalidArgument
exception is made on a case-by-case basis. We prefer to raise an error when we think there's something the user could reasonably do to avoid it; or use nothing()
when that would be impractical.
I think in this case I'd just translate to filters, as the easy-to-implement approach. That means:
Annotated[str, Gt(1)]
will raise aTypeError
due to the comparison when an example is drawn. It's somewhat unfortunate that the error message won't mention that this came from an annotated type, but I think not worth the code required to improve on it.Annotated[int, Gt(1), Lt(1)]
will returnnothing()
, because that's equivalent to a strategy where all possible values are filtered out. In this case, I think we can add a few lines of code likeif s.is_empty: raise InvalidArgument(f"There are no valid values for type {t!r}")
Ok I see, I'll soon open a draft PR so that progress can be tracked.
- register a callback fn for time and datetime types, which accepts this arg, and plumb it through
By register a callback you mean using this function?
hypothesis/hypothesis-python/src/hypothesis/strategies/_internal/types.py
Lines 703 to 706 in 058420a
Ideally I will try to build strategies in a "smart" way, and avoid using filters when no filter rewriting is available (e.g. text(min_size=20)
instead of text().filter(lambda el: len(el) >= 20)
; applies to list/bytes/probably others). Even Gt/Ge/Lt/Le
can apply to types where filter rewriting could be implemented but currently isn't (e.g. datetime
), so I need to build the corresponding strategies in a smart way.
What's challenging is that I could manually build the strategies with the correct kwargs (e.g. min_size=..., max_size=...
) for each known type, but that would be rewriting what from_type
is currently doing with the extra logic from constraints.
Let's start simple, and only include filters in the initial version. If sometimes that means users get Unsatisfiable
rather than data which violates their annotation constraints, I think that's still an improvement!
After we've shipped the logically correct version, we can look at improving performance in common cases.