python-jsonschema/hypothesis-jsonschema

Possible improvements on not supported & invalid regexes

Stranger6667 opened this issue · 3 comments

In web APIs, users often use regular expressions syntax supported by their backend, and sometimes it is not compatible (in some areas) with the one supported by JSON Schema.

For example, this AWS API uses character classes that are supported by Java, for example, \p{Alpha}. It is not supported in Python stdlib re module, and currently hypothesis-jsonschema uses st.nothing() for such cases. In the simplest case, it leads to Unsatisfiable as there are no values in this strategy.

But consider an array:

{
    "items": {
        "pattern": "\p{Alpha}",
        "type": "string",
    },
    "maxItems": 50,
    "minItems": 0,
    "type": "array",
}

Even though we don't support generating strings for such regular expressions in the schema above, we still can generate an empty array that will match the schema. The same could be applied to optional properties, etc.

The current error output for the schema above:

hypothesis.errors.InvalidArgument: Cannot create a collection of max_size=50, because no elements can be drawn from the element strategy nothing()

From the user perspective, it will be nice to expose some information about why it happens (the unsupported regex syntax). For cases when we still can generate data without those items/properties, it might be a warning, and for cases when we can't, a better error message will be great (e.g., if there is minItems: 1)

What do you think?

P.S. I am pretty sure that I saw a different InvalidArgument error that was also connected to drawing from nothing() - I will post an update once I find it

I think this is two separate issues:

  1. Emit a warning when we encounter a regex pattern which is not valid in Python, and
  2. Work around Hypothesis' usual check that list elements is non-empty if max_size is set

Both are fixable, of course 🙂

Fixed by 175213f and 851d4c7 respectively, and released as 0.19 😄

Amazing! Thank you! :)