json-schema-org/json-schema-spec

Issue: subset schemas don't work with `unevaluatedProperties` across `oneOf` branches

Opened this issue · 6 comments

I can't figure out how to set up mutual exclusion of subset schemas elegantly.

I thought unevaluatedProperties was the solution so that I can easily reject properties of the superset schema that don't appear in the subset schema, but I can't find the right pattern to set it up right.

Subset and Superset schemas

Let's define subset schema as such:

A schema is a subset schema of another schema if any instance of the first schema also is valid also in the second schema.

Similarly,

A schema is a superset schema of another schema if the other schema is a subset schema of it.

Rationale

I have a real world schema with 2 kinds of values. But the first option happens to be a subset of the second
option. I'd like to make it so that if the subset schema validates, the superset schema will not validate.

{
  "foo": (Either subset schema or superset schema)
}

Attempt 1 - Use not - PASS

oneOf:
- required: ['a']
  properties:
    a: {}
  not:
    required: ['b']
    properties:
      b: {}
- required: ['a', 'b']
  properties:
    a: {}
    b: {}

This one works as expected, where {"a": true, "b": true} only validates against the second of the oneOf options, but I have to nest the meat of the second option inside a not of the first. This is ugly, but effective. When I compose my schemas, I don't want to reference all possible superset schemas in the subset schema.

Attempt 2 - Use anyOf - PASS

anyOf:
- required: ['a']
  properties:
    a: {}
- required: ['a', 'b']
  properties:
    a: {}
    b: {}

This approach gives up on the "typing" quality of the oneOf where consumers of the instance can learn about the "type" of the schema by inspecting which branch validates. Consumers need more sophistication since both validate for {"a": true, "b": true}.

Attempt 3 - Use unevaluatedProperties at the top level - FAIL

unevaluatedProperties: false
oneOf:
- required: ['a']
  properties:
    a: {}
- required: ['a', 'b']
  properties:
    a: {}
    b: {}

For this instance, {"a": true, "b": true}, validation will fail on the oneOf since both branches validate and the collected annotations from the valid oneOf branches make both "a" and "b" evaluated properties.

Attempt 4 - push unevaluatedProperties to child schemas - FAIL

If I push down unevaluatedProperties: false, I get the same issue as we have with additionalProperties: false, schemas can not be extended if a parent "tacks on" some more properties.

oneOf:
- unevaluatedProperties: false
  required: ['a']
  properties:
    a: {}
- unevaluatedProperties: false
  required: ['a', 'b']
  properties:
    a: {}
    b: {}
properties:
  c: {}

If I try this JSON, the second oneOf will fail on unevaluatedProperty: "c"

{"a": true, "b": true, "c": true}

Attempt 5 - Introduce new in-place applicator firstOf: PASS

Let's define a new in place applicator, firstOf

This keyword's value MUST be a non-empty array. Each item of the array MUST be a valid JSON Schema.

An instance validates successfully against this keyword if it validates successfully against at least one schema defined by this keyword's value.

Annotations are only collected from the first valid sub-schema.

Consumers are to ignore subsequent valid subschemas beyond the first valid subschema

unevaluatedProperties: false
firstOf:
- required: ['a']
  properties:
    a: {}
- required: ['a', 'b']
  properties:
    a: {}
    b: {}

Both branches validate this instance, {"a": true, "b": true}, but only annotations from the first branch
are collected. This causes unevaluatedProperties to disregard annotations from the second branch and thereby reject property "b" as "unevaluated".

It sounds to me like you're trying to model a polymorphic system, which has historically been a problem for JSON Schema. Notably, JSON Schema is a constraint system, not a data modelling system. As such, it has a really hard time representing polymorphic types.

What I've typically seen done is encode a "type" property in your data model. This can then be used along with a const keyword to isolate which subschema should apply.

{
  "type": "object",
  "properties": {
    "a": true,
    "b": true,
    "c": true
  },
  "required": ["type"],
  "oneOf": [
    {
      "properties": {
        "type": { "const": "a" }
      },
      "required": ["a"]
    },
    {
      "properties": {
        "type": { "const": "b" }
      },
      "required": ["b"],
      "not": { "required": ["c"] }
    },
    {
      "properties": {
        "type": { "const": "ab" }
      },
      "required": ["a", "b"]
    },
  ]
}

Note

This operates very similarly to OpenAPI's discriminator concept, where type would be your discriminator. (OpenAPI's keyword actually allows the user to define the name of the property to use as a discriminator, in this case "type", and then there's additional subschema selection processing that happens. The above is functionally equivalent and only uses JSON Schema.)

This would validate

  • {"type": "a", "a": 1}
  • {"type": "a", "a": 1, "c": 3}
  • {"type": "b", "b": 2}
  • {"type": "ab", "a": 1, "b": 2}
  • {"type": "a", "a": 1, "b": 2, "c": 3}

and would fail

  • {"type": "b", "b": 2, "c": 3}

Notice that to restrict c, you only have to include the {not: {required: []}} structure.


In regard to unevaluatedProperties, you can put one at the root to forbid any properties that aren't listed in this schema. So for instance, if you wanted to disallow foo.

Thanks @gregsdennis , Yeah it would be nice if I could add a type property, but this particular problem is a brownfield project. I can't change the JSON instances, but I can change the schema to something that will validate things.

Chatting with @hudlow, firstOf could be expressed as syntax sugar for an if/then chain

with firstOf

firstOf:
- required: ['a']
  properties:
    a: {}
- required: ['a', 'b']
  properties:
    a: {}
    b: {}
- required: ['a', 'b', 'c']
  properties:
    a: {}
    b: {}
    c: {}

with if/then chain

unevaluatedProperties: false
if:
  required: ['a']
  properties:
    a: {}
then:
  if:
    required: ['a', 'b']
    properties:
      a: {}
      b: {}
  then:
    required: ['a', 'b', 'c']
    properties:
      a: {}
      b: {}
      c: {}

can you please assign this to me ! I want to work on this

@rishabh1721, this was just a question. There's nothing to do. I'm going close this because it appears to be resolved or at least isn't being actively discussed anymore.

@jdesrosiers oh, this still isn't resolved. Are these issues tracking all problems or just problems that people have active chatter about? How many months do I wait for a comment before I need to make more chatter?

I still don't have a good pattern for modeling subset schemas with oneOfs.

Do others not have this same problem?

If it's the decision of the team that the best we can do is to use a type parameter, then so be it.

But for others who might stumble upon this issue, I want to be clear that I do not have a good pattern.

I suggested the firstOf applicator as a possible syntactic sugar over an if/then chain. As this has closed without comment, I presume folks think it's not worth doing.

Sorry, @smikulcik, I skimmed the issue too briefly and missed that there was a keyword proposal included there. I'll reopen the issue, but given the lack of responses so far, I'm not sure I see it getting enough interest to move forward.

How many months do I wait for a comment before I need to make more chatter?

Keep in mind that everyone here are volunteers working on this in their spare time. We contribute when we can and there's certainly no one going through stale issues on a regular basis. Our community has a convention of giving people two weeks to get to around to making a comment or review or whatever. If it's been more than two weeks feel free to ping someone for their attention. It's possible that no one responds because no one else is interested. It happens to me all the time. Generally things only get added or changed if it's something that comes up repeatedly and often.

We're starting a new process for introducing new features. It's so new, we only have one written up so far. If you want to try to push firstOf forward, I suggest starting by creating a new issue specifically for proposing the keyword. Then you have to make the case that it's something that's needed and find people to say they also want that feature or find other cases of people asking for something similar. Since it's your proposal, it would likely be you doing the work to push it forward. We're all volunteers here.

Another option is to use the vocabulary system to define your custom keyword. Then you can use it whether or not it's accepted as a main spec feature.


Since I reviewed the current discussion in detail this time, I'll also share my view so far.

Currently, I don't really see the problem. You say about the anyOf solution,

This approach gives up on the "typing" quality of the oneOf where consumers of the instance can learn about the "type" of the schema by inspecting which branch validates.

I get that the point is to be able to determine the type of the instance by which branch validates, but I think if that's your goal, you want to know about all of the schemas that match. If you have a function that takes a Mammal you should be able to pass it a Cat that extends Mammal. Because the type is a Cat doesn't mean it's not a Mammal. So, if the goal is to provide information about what type something is, I don't think firstOf is what's really needed and anyOf seems to be sufficient.

If your approach is to inspect the validation trace to see which branch validates, I still don't think you need anything other than anyOf. Just order your anyOf the same way you would order your firstOf. When you inspect which branch validated and you have multiple, select the first one. Or, if you don't want to worry about ordering, I'm pretty sure you could select the match with the most properties and that should give you the most specific type.

That brings me to my biggest concern about this, which is that JSON Schema is designed for validating data. Inspecting the validation trace to enable additional use cases is something we want to support, but introducing a new keyword that has the exact same validation behavior as an existing keyword just to get a slightly different validation trace is a tough sell, especially when there seems to be a reasonable way to get the data you need already (i.e. choose the first match).

To me, this problem doesn't seem like one that generalizes to a being about types, it's more about dealing with a poorly designed data structure. If that's the case, I would expect the schema to be a little more difficult to describe, I would think that's normal, not something that needs to be fixed.

Something that could help in this specific example is that your schema is a little more complex than it needs to be. You can use required without properties, so you can get the same result with a simpler schema.

oneOf:
- required: ['a']
  properties:
    a: {}
  not: { required: ['b'] } # This can be a one-liner that solves the problem.
#   properties:            # Unnecessary
#     b: {}                # Unnecessary
- required: ['a', 'b']
  properties:
    a: {}
    b: {}

I get that it's annoying to have to even include the not, but it's not that bad and I think it's a reasonable thing to expect to have to do to deal with a poorly designed structure like this.