dottxt-ai/outlines

Implement JSON schema field constraints

rlouf opened this issue · 6 comments

rlouf commented

We can specify constraints for the different fields in the JSON schema specification. Only maxLength for strings is currently implemented. Remaining:

Strings

  • minLength
  • pattern

and the default formats that can be specified via the format keyword:

  • date-time
  • time
  • date
  • duration
  • email
  • idn-email
  • hostname
  • idn-hostname
  • ipv4
  • ipv6
  • uuid
  • uri
  • uri-reference
  • iri
  • iri-reference
  • uri-template
  • regex

Numeric types

  • multipleOf
  • minimum
  • exclusiveMinimum
  • maximum
  • exclusiveMaximum

Arrays

  • minItems
  • maxItems
  • uniqueItems (may only be applicable dynamically)
  • Set length

Tuples

See https://json-schema.org/understanding-json-schema/reference/array#tupleValidation

Required fields

We should handle optional fields as well, i.e. those not specified in the required field of the schema.

Here are some examples of integer range constraints expressed as regular expressions: https://stackoverflow.com/a/34680927/3006474, https://3widgets.com/

What is the status of this? Are length of tuples/lists also implemented?

fire commented

To be honest I switched to ggml’s ebnf for grammar constraints.

To be honest I switched to ggml’s ebnf for grammar constraints.

How does an EBNF-specified grammar provide these constraints?

While playing around trying to validate meta schemas, I ran into $recursiveRef or $dynamicRef, any thoughts on implementation of recursive schemas? I'd like to give it a try

$recursiveRef found in:
https://json-schema.org/draft/2019-09/schema
https://json-schema.org/draft/2019-09/meta/core

$dynamicRef found in:
https://json-schema.org/draft/2020-12/schema
https://json-schema.org/draft/2020-12/meta/core

While playing around trying to validate meta schemas, I ran into $recursiveRef or $dynamicRef, any thoughts on implementation of recursive schemas? I'd like to give it a try

Regarding recursion, there's an existing issue for that (albeit framed via the Pydantic interface): #330. Feel free to continue the discussion there.