json-schema-org/json-schema-spec

Validation: if/then/else

epoberezkin opened this issue Β· 75 comments

This can be seen as the extension of #31 and #64. I see it as the syntax sugar for existing boolean/compound keywords.

The validation process is very simple:

  • if the schema in if is valid, than the schema in then should be validated and its outcome determines the instance validation.
  • if the schema in if is invalid then the schema in else should be validated and its outcome determines the instance validation.
  • if if is not present then the schema itself is invalid (metaschema validation will fail).
  • if neither then nor else is present - the same as above

As I've written in some issue the schema with if/then/else:

{
  "if": {"$ref": "condition" },
  "then": {"$ref": "schema1"},
  "else": {"$ref": "schema2"}
}

is a boolean operation and it is equivalent to the schema below that is possible now:

{
  "anyOf": [
    { "allOf": [ {"$ref": "condition" }, {"$ref": "schema1"} ] },
    { "allOf": [ {"not": {"$ref": "condition" } }, {"$ref": "schema2"} ] }
  ]
}

so if/then/else is as declarative as existing keywords but it provides a more convenient, clear and performance efficient alternative ("condition" will never be validated twice) for a quite common validation scenario.

Using if/then/else the problem in #31/#64 is also solved.

so if/then/else is as declarative as existing keywords

While Wikipedia is not always the best resource, this article is well-cited and the following definition is what I usually hear for declarative programming:

In computer science, declarative programming is a programming paradigmβ€”a style of building the structure and elements of computer programsβ€”that expresses the logic of a computation without describing its control flow.

So if/then/else is by definition not declarative. The distinction between declarative and imperative is control flow. If/then/else is control flow. Imperative vs declarative has nothing to do with whether the outcome is the same- all programming styles can produce the same output. It's about how things are processed.

It is certainly worth discussing if we want to add this imperative construct to JSON Schema, but it is the very definition of an imperative construct.

I am not sure why implication is more imperative than and/or/negation - they are all operations from boolean algebra. The fact that the evaluation of implication can be short-circuited (i.e. when "if" is false then "then" doesn't need to be evaluated) desn't make it more imperative than "and" and "or" - they also can be short-circuited.

Also there are languages that use if/then/else as expression, not as a control flow statement. If you don't like the keywords, some other can be used.

Please see this article: https://en.wikipedia.org/wiki/Material_conditional

I am talking about boolean operation p => q (ignoring else here for simplicity, as it is an additional sugar really) which has the following truth table:

p q p=>q
False False True
False True True
True False False
True True False

What makes you think it to be more imperative than and/or/xor/not that we alreay have?

that expresses the logic of a computation without describing its control flow.

That's exactly what is the table above - the logic without control flow.

Because there is a widely accepted plain-english definition that says so. "imperative programming" and "declarative programming" have well-defined meanings, which are not about whether things can be expressed in terms of logical predicates. Go find a credible definition of "declarative programming" that includes an if statement and I'll discuss that definition with you. But making up your own approaches to these terms undercuts your primary argument.

As I said, I don't mind what the keywords are. Let's use "ante" (from "antecedent", the term for "p" in p=>q) and "cons" ("consequent").

@handrews I can refer you back to the same article you quote: https://en.wikipedia.org/wiki/Declarative_programming#Constraint_programming

Subparadigms
Declarative programming is an umbrella term that includes a number of better-known programming paradigms.
Functional programming, and in particular purely functional programming, attempts to minimize or eliminate side effects, and is therefore considered declarative.

Many functional languages support if/then/else construct as its result is pure - it doesn't have side effects and it is not dependent on the order of execution, it's determined by truth table, same as allOf/anyOf/oneOf (and/or/xor).

Also see https://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Declarative_languages

Many languages in this list support conditionals.

Very few systems are purely one style or another. And maybe JSON Schema ends up not being purely declarative in the sense I am advocating. Ultimately, whether if-then-else is "declarative" or not (by whoever's definition) is less important than whether you can convince more people here that it is the right direction for JSON Schema.

An alternative for for the same would be:

{
  "conditional": [
    {"$ref": "condition" },
    {"$ref": "schema1"},
    {"$ref": "schema2"}
  ]
}

There should be exactly 2 or 3 items in this array, more or less should make it invalid.

See also #168 (comment)

I think it can be more preferable as it doesn't look imperative at all to me.

I would prefer this suggestion to the switch keyword, at the very least. I kept having problems with switch behaving in unexpected ways, usually because of the way "then" and "continue" interact. This feels much simpler, and harder to trip yourself over.

@HotelDon I agree. I never actually used continue. And you can achieve everything the switch gives you (without continue) by combining the above with the existing keywords.

To be honest, I don't really care what paradigm it falls under. I'm not even sure you can clasify JSON Schema under any paradigm, but that's a different debate ("We" don't even all agree if it is code or not anyway).

Things I care about when looking at adding new functionality:

Is there a use case?
Yup, I can think of a few.

Does it make JSON Schema easier to use?
Yup, the example clearly shows.

Does it cause problems for implementors?
I can't answer that one, but considering it's possible that implementors may already be intelegantly asessing anyOf to construct the same logic structure, by looking at for the structure style shown in the example, leads me to think probably not.

I'd also much prefer this to a switch, as, as mentioned, it could be unclear or confusing.

@handrews unless you can think of a reason why this would be a BAD thing, them I'm for it.

By the way, would you prefer conditional or if/then/else? I think I like conditional more, not because it's less imperative looking but because it's a single keyword.

As I said before:

Ultimately, whether if-then-else is "declarative" or not (by whoever's definition) is less important than whether you can convince more people here that it is the right direction for JSON Schema.

People seem to be for this, so I'll go with that. It's not about my personal preference.

We all seem to agree that switch was too complicated. And yes, too imperative, while if/then/else falls into this "it depends on how you think about it" zone. Having given it a rest overnight, I do see @epoberezkin's point of view here even if it is not my preferred way to consider it.

By the way, would you prefer conditional or if/then/else? I think I like conditional more, not because it's less imperative looking but because it's a single keyword.

I prefer if/then/else because it is just more obvious. I suspect it's because I just woke up but I had to think for a second to realize that the 2nd and 3rd schemas in the "conditional" list behave as "then" and "else" respectively.


There is one property I would like to make sure we preserve, which gets to the real-world implications of declarative vs imperative: I want to make sure that it is always safe to evaluate all schema branches. Right now I believe that is true because in general JSON Schema validation is purely functional, without state or side effects. As long as that remains true, it should be safe to (for instance) pass each subschema off to separate threads or co-routines and decide what to do with the results later.

If it's always safe to evaluate both the "then" and the "else" no matter how the "if" validates, then this really is identical to using the existing boolean keywords but more clear. It should also always be safe to only evaluate either the "then" or the "else" depending on the result of the "if" validation. This is the same as saying that you can either short-circuit "allOf", etc. or you can check all branches even if you know that since one fails the overall result will be a failure. It's up to the implementation to decide.

Does this property make sense? Do others agree it is desirable?

If it's always safe to evaluate both the "then" and the "else" no matter how the "if" validates, then this really is identical to using the existing boolean keywords but more clear. It should also always be safe to only evaluate either the "then" or the "else" depending on the result of the "if" validation.

I fully agree. That's what I was trying to explain (badly) with my truth tables.

I prefer if/then/else because it is just more obvious. I suspect it's because I just woke up but I had to think for a second to realize that the 2nd and 3rd schemas in the "conditional" list behave as "then" and "else" respectively.

I agree with that as well. List based conditional in clojure makes me think for a second too, while explicit if/then/else in haskell is immediately clear. So I am ok with if/then/else.

@handrews thank you

@handrews if you like I can try making a PR with the description of the keyword for the document.

@epoberezkin Given that @awwright hasn't weighed in yet, I'd like to keep this out of Draft 6 unless there's full agreement. But I would definitely encourage you to write a PR for Draft 7. Does that seem reasonable? I'm trying (without much success) to keep a focus on resolving Draft 6 issues so we can publish that draft rather than adding more things to the pile.

Yes, agreed.

I would prefer a containing keyword for these, so you can have multiple ifs for a single entry. Something like this, maybe?

"conditional": [
    {
        "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": '#/definitions/schema2"}
    },
   {
         "if": {"$ref": "#/definitions/condition2"},
        "then": {"$ref": "#/definitions/schema3"},
        "else": {"$ref": '#/definitions/schema4"}
    }
]   

The order of the array shouldn't impact whether data passes validation or not, so schema writers can focus on it having it make sense at first glance.

If you only have need for one conditional statement,, then you could just skip the array entirely.

"conditional":  {
         "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": '#/definitions/schema2"}
    }

@HotelDon is there any reason that "allOf" wouldn't work for this?

"allOf": [
    {
        "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": "#/definitions/schema2"}
    },
   {
         "if": {"$ref": "#/definitions/condition2"},
        "then": {"$ref": "#/definitions/schema3"},
        "else": {"$ref": "#/definitions/schema4"}
    }
] 

@handrews No, I just keep forgetting that "allOf" exists - I tend to use "oneOf" a lot more than "allOf", so it slips my mind a lot.

I would still argue for a single keyword that encompases "if", "then" and "else", to make it more obvious at first glance what those entries are doing. So like this:

"conditional":  {
         "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": {"$ref": '#/definitions/schema2"}
    }

Or this:

"allOf": [
    "conditional": {
          "if": {"$ref": "#/definitions/condition1"},
          "then": {"$ref": "#/definitions/schema1"},
          "else": {"$ref": '#/definitions/schema2"}
     },
    "conditional": {
           "if": {"$ref": "#/definitions/condition2"},
           "then": {"$ref": "#/definitions/schema3"},
           "else": {"$ref": '#/definitions/schema4"}
    }
]

I realize, however, it's a pretty weak argument for adding a small amount of cruft in exchange for a small amount of clarity, so unless someone else feels strongly about it, the three keyword version is fine with me.

@HotelDon I don't think it adds much clarity. So far JSON-schema avoided adding two level keywords and that is one of the resons I disliked the ideas like switch and patternGroups - they have keywords inside keywords. But those inner things aren't keywords really, as they can't be used on their own. Then what are they?

if/then/else on it's own gives a smaller and more convenient building block than can be used to emulate switch without continue:

"anyOf": [
    {
        "if": {"$ref": "#/definitions/condition1"},
        "then": {"$ref": "#/definitions/schema1"},
        "else": false
    },
    {
        "if": {"$ref": "#/definitions/condition2"},
        "then": {"$ref": "#/definitions/schema2"},
        "else": false
    },
    {
        "if": { "not": { "anyOf": [
            {"$ref": "#/definitions/condition1"},
            {"$ref": "#/definitions/condition2"}
        ] } },
        "then": { "ref": "#/definitions/defaultSchema" },
        "else": false
    }
] 

boolean form of schema makes it easy and elegant to make if/then fail if "if" fails. That's for the cook-book :)

So I think flat is better...

So far JSON-schema avoided adding two level keywords

Yeah, I had noticed that a few hours ago when I was comparing the formatting for existing keywords. I wouldn't rule it out entirely for other future keywords, but I can see how it'd be better to keep this flat.

If we want to fit with the current convention of keywords, a "conditional" keyword would probably be the best bet:

{
"conditional":  {
         "if": {},
        "then": {},
        "else": {}
    }
}

Maybe an array could provide "switch" like functionality, where only the first match is picked:

{
"conditional":  [
   {
         "if": {},
        "then": {}
    }.
   {
         "if": {},
        "then": {}
    }
]
}

I also want to get an idea of how frequently this is actually necessary... how often would this actually simplify schemas and their output? Are there any use cases anyone is aware of?

@awwright the thing that got us to the point of consensus at all was dropping the switch functionality, so re-introducing it is counter-productive.

how often would this actually simplify schemas and their output? Are there any use cases anyone is aware of?

Very frequently. I and/or teams I have worked with have done this sort of thing with "oneOf" many times, and while I argued that "oneOf" was sufficient, if-then-else is undoubtedly more intuitive.

JSF had some long discussions on this sort of thing relating to rules and UI Policies earlier this year and the below example was the solution we agreed was our preferred, although when/is/with was a late consideration from prop/operator/expected.

Probably the main difference being we chose not to provide an else case (relying on default or user entry instead) and the actions can be applied to multiple fields at once rather than per field. We also included the ability to run a function to determine the truthiness of the conditional. The function option is not the function but a tag to indicate the function to use which allows it to be language/implementation agnostic.

Example
"ui": {
  "policy": [
    {
      "summary": ["BR", 4, "Show fields when ..."],
      "if": {
        "allOf": [
          {
            "anyOf": [
              {
                "when": "schemaform1/my/model/person/age",
                "is": "equal",
                "with": "36"
              },
              {
                "when": "schemaform1/my/model/person/name",
                "not": "container",
                "with": "someName"
              }
            ]
          },
          {
            "function": "someFunction",
            "params": [
              "schemaform1/my/model/person/age",
              "schemaform1/my/model/person/name"
            ],
            "is": "equal",
            "with": "true"
          }
        ]
      },
      "action": [
        { "update": "model/product/fieldA",
          "set": { "required": true, "readonly": false, "visible": true } },
        { "update": [
            "model/product/fieldB",
            "model/product/fieldC",
            "model/product/fieldD"
          ],
          "set": { "required": false, "readonly": false, "visible": true, "enabled": true }
        }
      ]
    }
  ]
}

UI Policies would be for updating boolean criteria as described in the original issue post and not conditional evaluation for the entire schema of a field. By having them able to act on multiple fields at once rather than adding the same rule to multiple fields it can save on evaluation/watch resources or complexity in front end code.

The addition of data.policies to go with ui.policies which allow for the same behaviour but only apply on import/save of data into the backend is also something I would be a fan of.

ps. the point of ui as root is to group ui behaviour definition, ui.policies, ui.actions, ui.scripts and data.policies, data.rules for actions applicable to the back end.

@Anthropic I can see why you went that way with UI policies, particularly being able to reference a function for better integration with front-end code. But I want to go back to the declarative thing here. As @epoberezkin pointed out, there is a declarative notion of if-then through propositional logic. And while I was going for a narrower concept of declarative, if we enforce that it is always safe to evaluate all three parts of an if-then-else, then it is in fact reasonably declarative.

But putting in operators and especially function references is getting into substantially different complex territory. I think schemas define enough constraints that we should not need additional operators (perhaps I am missing some aspect, though).

I'm not trying to say that you shouldn't do this for UI schemas- they are a different use case and are more likely to lean towards code-like behavior. But I'm hesitant to take this direction in the main validation vocabulary. Does that make sense?

*I said "predicate logic" at some point but the correct term is "propositional logic"

@handrews yes I get what you are saying. I pretty much agree. I just thought I'd mention it in case anyone had any concerns/thoughts about how they will fit together. But also I believe I misread the original issue.

I'm a little late to this discussion, but here's my two cents.

Like others, I was against this because I saw it as applying imperative programming concepts to JSON Schema which is not an imperative language. However, I was convinced when it was pointed out that this functionality is nothing more than syntactic sugar for something JSON Schema can already do. I hadn't thought about this particular issue that way and it was a very convincing argument. My only concern is that it might encourage schema writers (especially the unskilled ones) to think imperatively rather than declaratively.

I prefer the single keyword version that was proposed

{
  "conditional": [
    {"$ref": "condition" },
    {"$ref": "schema1"},
    {"$ref": "schema2"}
  ]
}

However, I recognize that it is less accessible to the masses. That might be a good thing tho. Making it look a little less like a typical if-else expression might discourage people from thinking imperatively.

I still don't like the conditional array, but I could get behind the simpler conditional object that @awwright proposed (but not the switch-like list form):

{
    "conditional": {
        "if": {...},
        "then": {...},
        "else": {...}
    }
}

As is obvious from earlier comments, I worry about imperative thinking as well. But if we're introducing this for clarity, we should go for clarity.

@handrews @awwright I believe that conditional with sub-keywords (for the lack of better term) is the worst option of all 3 as it introduces a completely new design approach - exactly these sub-keywords that JSON schema has managed to avoid so far, and I hope will avoid in the future, unless they are really needed.

So conditional with array of 3 items that @jdesrosiers likes is ok, as it follows the design of conditional in some functional languages and people will get used to it. It also makes people not to think imperatively.

Flat if/then/else is potentially easier to understand and it allows for syntactically simpler constructions when you want to combine multiple conditionals (compared to "conditional" with sub-keywords), instead of #180 (comment) you would have to:

"anyOf": [
    {
        "conditional": {
            "if": {"$ref": "#/definitions/condition1"},
            "then": {"$ref": "#/definitions/schema1"},
            "else": false
        }
    },
    {
        "conditional": {
            "if": {"$ref": "#/definitions/condition2"},
            "then": {"$ref": "#/definitions/schema2"},
            "else": false
        }
    },
    {
        "conditional": {
            "if": { "not": { "anyOf": [
                {"$ref": "#/definitions/condition1"},
                {"$ref": "#/definitions/condition2"}
            ] } },
            "then": { "ref": "#/definitions/defaultSchema" },
            "else": false
        }
    }
] 

Why don't we stick with conditional with array of 3 items? It's consistent to the current design of using anyOf/allOf instead of or/and, as for the latter the expectation is to short-circuit as it is for if/then/else. But it's a bit less obvious than if/then/else.

Or we can stick to if/then/else because it's less verbose. But it looks imperative :)

conditional with if/then/else inside is bad from 3 different points of view (sorry for repetition):

  • new design pattern
  • unnecessarily verbose
  • still looks imperative

It also makes people not to think imperatively.

You're going to have to walk me through this, @epoberezkin

new design pattern

Yes, it would be nested keywords, which is part of why I prefer just plain old "if", "then" and "else" ungrouped. Or "conditionalIf", "conditionalThen", "conditionalElse" if absolutely necessary.

But a far more troubling new design pattern would be an order-sensitive array. We do not have any such constructs in JSON Schema as it is, and they are inherently fragile. The only place where array order is significant is the tuple form of "items", where specifying exact position-based subschemas is the point.

Otherwise, none of the keywords that take an array ("anyOf", "allOf", "oneOf", "type", "required", or the string list form of "dependencies") are sensitive to the ordering within the array.

@epoberezkin , @jdesrosiers nearly everything else is negotiable within the general agreement so far on if/then/else, but order-sensitive arrays are where I draw the line. It's confusing, fragile, and against the current design of the system.

It also makes people not to think imperatively.

You're going to have to walk me through this, @epoberezkin

That's said by @jdesrosiers

I'm ok with flat if/then/else - the simpler the better

LOL oops my bad. OK @jdesrosiers I'm not entirely following the "not think imperatively" thing :-)

I get that it doesn't look like if/then/else but it doesn't look like much of anything so I feel like it's avoiding imperativeness by being confusing rather than by guiding thought in a different direction.

We could go back to @epoberezkin's earlier idea of "antecedent"/"consequent" if we want to separate the intuition from imperative thinking but still make the schema readable. Since they are less common terms, I would prefer to spell out "antecedent" and "consequent" rather than using "ante" and "cons". I guess "else" would be "alternative"? Propositional logic does't have elses :-P

Which raises a question- should we have "else"? Or should we stick with "antecedent" and "consequent" and use other keywords ("oneOf", "not", etc.) to make constructs more complex than just P->Q?

if p then q else r is equivalent to (p -> q) && (!p -> q) so not having else would make it much more verbose and repetitive. Many real life use cases need else. That's why many languages have javascript like constructs p ? q : r and don't have implication expression (->) without else part.

Removing else feels like over correcting, and as @epoberezkin said, all it does is make schemas that need it more verbose and harder to maintain. And I am still against any solution that both requires an array and makes the order of the array matter - I don't think it's worth the extra confusion it causes just to discourage some theoretical bad behavior.

@HotelDon @epoberezkin cool. I floated the else removal just to kind of push at the boundaries here.

I think we're mostly agreeing that the solution that best balances readability and existing design constraints is either "if"/"then"/"else" or "conditionalIf", "conditionalThen", "conditionalElse".

Another related question is what to do with dependencies:

{
    "dependencies": {
        "foo": ["a", "b", "c"],
        "bar": {"patternProperties": {"^[a-z]": {"type": "number"}}}
    }
}

is equivalent to

{
    "allOf": [
        {
            "if": {"required": ["foo"]},
            "then": {"required": ["a", "b", "c"]}
        },
        {
            "if": {"required": ["bar"]},
            "then": {"patternProperties": {"^[a-z]": {"type": "number"}}}
        }
    ]
}

Setting aside compatibility for a moment, do we think the dependencies cases are common enough to warrant their own shortcut syntax? Arguably if/then/else is more intuitive. I know that "dependencies" gave me a headache when I was first learning JSON Schema as it's a very special-purpose construct that does two slightly different things.

@handrews, the "not thinking imperatively" comment is based on two things. Yes, one reason is that it is different than what most people are used to. Taking someone a little out of their comfort zone can be effective in getting them to think differently.

As @epoberezkin pointed out the tuple syntax is common in functional languages. I've seen multiple examples including Lisp. I think using a pattern used in declarative languages does more to encourage declarative thinking.

But a far more troubling new design pattern would be an order-sensitive array.

I have to disagree. With respect to keyword ordering, I am with you %100. But, this is different. This is the keyword itself. It's not a schema so it doesn't make sense to apply the same rules. It doesn't compare to keywords like allOf because it would be works against a tuple rather than array. The only keyword that compares is the hyper-schema media keyword which has binaryEncoding and type labels for it's parts. That is the only place where I think you can argue precedence. And it would be a good argument for using the if, then, and else labels.

To be clear, I'm not against using the if, then, and else labels, I just like the tuple syntax a little more and I don't think the arguments given for not considering it hold water. To me, the strongest argument against the tuple syntax is the precedence set by the media keyword.

@jdesrosiers I'm just thinking about trying to get multiple engineering teams in many offices worldwide comfortable with the tuple conditional and it's... not appealing.

Being able to point to Lisp's cond is helpful, but the lack of visual distinction between the test, then, and else schemas is problematic.

In Lisp or other similar languages, you mostly try to keep your cond terms concise (IIRC- it's been more than 20 years since I wrote any lisp). Schemas are often very long and difficult to fit into a clear visual. It's one of the most common complaints I hear from engineers working with schemas, and one reason that I've usually had people work in YAML and convert to JSON- it's just much more compact.

But if you have a 10-line "if" schema, a 20-line "then" schema and a 15-line "else" schema (and if the then and else schemas are hyper-schemas, you'll have even longer schemas), then visually understand what you're looking at becomes extremely challenging.

With allOf and friends, you can just tell by indentation- it might be a bit confusing as to exactly which alternative you're looking at, but they all behave the same way.

With conditionals, it matters a great deal which of the three you're examining, and I see the verbosity combined with the lack of clearly identifying syntax to be a major usability and maintenance problem.

(as an aside, I have no idea why media is like that and have wondered about it for some time)

@handrews You have some good points. I think your visual concerns can be addressed by using definitions/$ref. I doubt it would cause "major" problems. I don't think it would take people long to get used to it. But, without people actually using it, it's just my intuition and experience against yours :). We're all just guessing.

I define "major" as "likely to cause bugs". I'm not thrilled at the idea of having to move stuff out to "definitions" just to make an awkward keyword readable. While I certainly do split things up for readability, right now everything is relatively easy to understand inline, if you can just find the beginning of the schema, or the beginning of the list containing the schemas. In the list case, you do not need to keep track of how far you are in the list.

This would be the first keyword where that is not true.

While we are all just guessing to some degree, I have brought many individuals and teams on board with JSON Schema and I am not sold on the assertion that this will work. Where by "work" I mean "won't end up with a bunch of buggy schemas and/or confused people showing up at my desk."

I just find cond to be the worst of both worlds- an explicit conditional that is not easy to read.

and one reason that I've usually had people work in YAML and convert to JSON- it's just much more compact.

This is just a bit of an aside, but I would like to point out that I think JSON Schema is a better "YAML Schema" than any of the native schemas for YAML I've found, even if that was not an intentional design choice. I doubt it would ever be worth making a separate thing called "YAML Schema", but it might be helpful to mention as a side note on the JSON Schema website that it's technically compatible with anything that can directly translate to JSON. It DOES make everything look prettier (well, most of the time, anyways).

I think @handrews 's argument about readability being better with if/then/else is quite a strong one - it would be very easy to get lost in "tuple". The big difference with functional languages is that usually cond connects short names/expressions, in this case they will definitely be bigger. Definitions don't really solve the problem as their main use case is re-usable entities rather than all schema fragments.

So I think that if/then/else is better too now.

Re dependencies, I don't mind it being dropped. Since the moment I had switch I always preferred if/required combination. To be honest it never even occurred to me to use dependencies keyword (because I needed if anyway to test for specific field values as well, so it was more consistent without dependencies).

I'm not convinced the sugar is sweet enough.

  1. The proposed solution will, in practice, essentially separate schema1 into two separate schemas: the condition schema and "schema1". This means "schema1" often can't be re-used elsewhere, because it's missing the keywords that the condition schema contains. To make "schema1" reusable I would most likely reach for the allOf keyword.

    To illustrate what I mean I "inlined" schema1 in the example:

    {
      "if": { "$ref": "condition" },
      "then": {
        "allOf": [
          { "$ref": "condition" },
          { "$ref": "original-schema1" }
        ]
      },
      "else": { "$ref": "schema2" }
    }
  2. People will nest conditionals, which can become very complicated very quickly.

@britishtea separating the condition and the alternatives is exactly what we want. "schema1" and "schema2" may well be re-usable independent of each other or of the condition.

{
    "definitions": {
        "identifiedByNumber": {
            "properties": {"identifier": {"type": "integer", "minimum": 1}}
        },
        "identifiedByName": {
            "properties": {"name": {"type": "string", "pattern": "^[A-Z][a-z]*"}}
        }
    },
    "type": "object",
    "properties": {
        "useName": {"type": "boolean", "default": "false"}
    },
    "if": {"properties": {"useName": {"constant": true}}},
    "then": {"$ref": "#/definitions/identifiedByName"}},
    "else": {"$ref": "#/definitions/identifiedByNumber"}}
}

I may want to use names or numeric identifiers elsewhere. I may use them unconditionally. I may use them depending on some other condition. If I want to use the whole condition in multiple places, then I'd organize it like this:

{
    "definitions": {
        "identifiedByNumber": {
            "properties": {"identifier": {"type": "integer", "minimum": 1}}
        },
        "identifiedByName": {
            "properties": {"name": {"type": "string", "pattern": "^[A-Z][a-z]*"}}
        },
        "nameOrNumber": {
            "type": "object",
            "properties": {
                "useName": {"type": "boolean", "default": "false"}
            },
            "if": {"properties": {"useName": {"constant": true}}},
            "then": {"$ref": "#/definitions/identifiedByName"}},
            "else": {"$ref": "#/definitions/identifiedByNumber"}}
        }
    },
    "properties": {
        "firstThing": {"$ref": "#/definitions/nameOrNumber"},
        "secondThing": {"$ref": "#/definitions/nameOrNumber"}
}

People will nest conditionals, which can become very complicated very quickly.

People already do this with "allOf"/"anyOf"/"oneOf"/"not" and it's much, much harder to read than nested conditionals.

It's not JSON Schema's job to make complex schemas impossible. This change makes complex schemas easier than they were. If a project wants to ban complex conditionals, that's a totally reasonable thing to do, but that's something that people designing schemas should decide on. It is not a limitation that should be encoded into the spec.

"schema1" and "schema2" may well be re-usable independent of each other or of the condition.

Just to clarify, I don't think this is the common case.

People already do this with "allOf"/"anyOf"/"oneOf"/"not" and it's much, much harder to read than nested conditionals.

It's not JSON Schema's job to make complex schemas impossible. This change makes complex schemas easier than they were.

To be blunt, all this change does is allow you to omit the negation of a condition in a limited set of situations. If you're lucky enough the condition is not part of your "then" and "else" clauses, you can drop the "allOf" keywords too. While it's certainly easier and convenient in those cases, is it worth introducing a whole new concept (considering it offers no new functionality)?

Issues #31 and #64 (as mentioned in the opening comment) raise valid issues, but I don't think this proposal addresses them very well. I think a more generic solution can be found and should be preferred.

@britishtea if you think there is a better solution please propose one. There has been a lot of discussion of this topic over the years since around the time of draft 4 (in the issues you mention and elsewhere), and this is the first proposal to really get broad buy-in.

The request is one that comes up over and over- I spent quite a bit of effort trying to hold the line at "oneOf" / "anyOf" but the demand is just too high to ignore.

It's great to see a new (at least to me) name in the conversation, and I hope you stick around. But given the history here and the amount of work put in to reach this much agreement, you're going to need to come up with an alternate proposal and sell it if you want to see something different. We're not rushing this in so you have a bit of time to think on it.

The proposed solution will, in practice, essentially separate schema1 into two separate schemas: the condition schema and "schema1". This means "schema1" often can't be re-used elsewhere, because it's missing the keywords that the condition schema contains.

@britishtea in practice, condition is usually simpler - it often tests for a specific value or for a presence of a field.

Also I think you are missing a fundamental difference: "condition" schema (the schema in "if") does not affect the result of the validation directly, it's only used as a predicate to choose one or another schema to validate against.

You are essentially saying that instead of using:

{
  "if": {"$ref": "condition" },
  "then": {"$ref": "schema1"},
  "else": {"$ref": "schema2"}
}

you would always prefer using

{
  "anyOf": [
    { "allOf": [ {"$ref": "condition" }, {"$ref": "schema1"} ] },
    { "allOf": [ {"not": {"$ref": "condition" } }, {"$ref": "schema2"} ] }
  ]
}

in cases you need such logic? And that the latter is easier to understand than the first?

Also bear in mind, that the second is not only quite obscure, but also it is less efficient, unless your validator is very smart and able to optimise such cases efficiently (which I've never encountered).

You could of course use this:

{
  "anyOf": [
    {"$ref": "schema1+condition"},
    {"$ref": "schema2+not+condition"}
  ]
}

Maybe that's why you are talking about splitting anything. This approach creates quite a few problems with

  • managing additionalProperties, particularly when you want to remove them, which is a very common use case
  • when you want to modify data as part of validation process, e.g. do type coercion, which is also quite common
  • error reporting (you will get errors from both branches because validator would not know which one is more likely to be valid - I had this question many times: "why can't you report errors only from the correct branch?". The problem here is that validator has no way of knowing what is the correct branch, without some complex and unreliable heuristics.
  • performance: "schema1+condition" has to be validated in its entirety, before the validator can try schema2 (and people prefer collecting all possible errors rather than short-circuiting validation)

Conditionals not only make it semantically clearer from the data/schema design point of view, but also address the above problems effectively. And lots of Ajv users use a much more complex switch because of that. So if/then/else seems a very necessary and a much simpler alternative than previously proposed switch.

Just to clarify, I don't think this is the common case.

It is a quite common use case, for example when you have an array of heterogenous items that should be valid according to one of two (or several) schemas based on some simple condition in these items.

People will nest conditionals, which can become very complicated very quickly.

That is correct, although as @handrews pointed out it is already the case with existing keywords. That's the reason why in addition to if/then/else we are discussing "select" (#64 (comment)) that would allow choosing the schema to validate against based on a value of some property (like switch in JavaScript). It is not more generic, but just a different approach and from my experience both are needed. I am going to implement "select" as a custom keyword for Ajv, I will post it here once it's ready. "switch" and "if/then/else" are already available in ajv or ajv-keywords, people use them.

Feel free to ask any other questions you may have.

@epoberezkin I understand it's simply a predicate to select a schema to validate against. In my experience the predicate will more often than not "overlap" (at least) "schema1".

Having said that, I can see the use case for having a separate predicate (i.e. a predicate that does not overlap) as well, though I worry about how easily they could get "out of sync".

You are essentially saying that instead of using:

[...]

you would always prefer using

[...]

I don't exactly prefer the latter schema if given the choice between the two, but this is not a choice between two options ;) An option that would allow matching against multiple predicates would have my preference. Perhaps "switch" is that option :)

managing additionalProperties, particularly when you want to remove them, which is a very common use case

Could you expand on this or point me to a discussion?

I think a more generic solution can be found and should be preferred.

@britishtea That was switch... And everybody here hated it, maybe because of fall-through clause ("continue") it included, and I don't know anybody who actually uses fall-through. So without fall-through it would look like:

{
  "switch": [
    {
      "if": {"$ref": "condition1"},
      "then": {"$ref": "schema1"}
    },
    {
      "if": {"$ref": "condition2"},
      "then": {"$ref": "schema2"}
    },
    {
      "then": {"$ref": "schema3"}
    }
  ]
}

That with if/then/else is equivalent to:

{
  "if": {"$ref": "condition1" },
  "then": {"$ref": "schema1"},
  "else": {
    "if": {"$ref": "condition2"},
    "then": {"$ref": "schema2"},
    "else": {"$ref": "schema3"}
  }
}

I don't have a strong opinion about which one is better, with many conditions switch can be seen as more concise. "To avoid nesting" was the main argument against if/then/else in the original switch proposal.

The main reason switch was objected against was a fall-through, as I wrote above, but without fall-through it is indeed more generic and equivalent to multiple ifs. Would switch be your preference @britishtea?

My main argument against switch is that it introduces sub-keywords that cannot be used outside - I am not a fan of it and we have no precedent of it in the spec yet.

I think that the only common use case when you have multiple conditions will be effectively addressed with "select", and in cases when you need more generic conditions (than comparing with a constant) you almost never have more than two conditions and if/then/else looks probably better as long as nesting is not very deep (definitely better when you have one condition which is the most common).

managing additionalProperties, particularly when you want to remove them, which is a very common use case
Could you expand on this or point me to a discussion?

Removing additionalProperties is not a part of the spec but a feature some validators have and many people use. The challenge is that when you have multiple schemas inside anyOf the validator has no way of knowing according to which schema to remove additional properties. I had many questions on the subject that only stopped when I posted FAQ in Ajv.

If instead of anyOf you use conditional, the problem is resolved, as you specifically tell validator which one is "the correct" schema to use, so it helps both filtering and error reporting.

Same problem with applying defaults by the way. Essentially any validation logic that modifies data in any way breaks on anyOf/oneOf and works well with conditional.

@britishtea Also, without fall-through, the switch above is equivalent to:

{
  "anyOf": [
    {
      "if": {"$ref": "condition1"},
      "then": {"$ref": "schema1"},
      "else": false
    },
    {
      "if": {"$ref": "condition2"},
      "then": {"$ref": "schema2"},
      "else": {"$ref": "schema3"}
    }
  ]
}

So you kind of have a more generic construct with multiple conditions already with the existing building blocks, there is no reason to introduce another one.

EDIT: I definitely prefer this to switch because we have normal keywords here and no sub-keywords, yet we have the same power as switch (without fall-through that everybody hated).

@epoberezkin @britishtea I also prefer "anyOf" + "if"/"then"/"else" over any form of "switch".

@britishtea in my experience, while I personally find "allOf"/"anyOf"/"oneOf"/"not" sufficient to manage conditionals, it is a steep learning curve for most engineers. In the organization where I used JSON Schema with multiple teams in the past, a few people got really good at it, and everyone else asked those people to design conditional solutions whenever they were non-trivial. That ended up working just fine, which is why I pushed back on this proposal at first in favor of "select". But I have to admit that it was a notable challenge during adoption, and "select" got more difficult the more I looked into it.

@handrews I've implemented select/selectCases/selectDefault keywords that we discussed here and also in #64 (comment) in ajv-keywords.

@epoberezkin @handrews I like it, my only question is, considering the schema in it's data definition role (not just for validation), shouldn't all data based options be in properties ideally? Data types can be re-defined within the select, but I always felt uneasy about *Of acting to define model properties not found in properties and now this would also do that based on examples. I feel it inhibits the ability to accurately generate a UI without needing to process validation keywords as if they are a ui-schema or part of the model definition. I'd prefer to see all items in the select required in properties and referenced.

For Model and UI purposes I would have thought it makes more sense for them to be required in properties, makes it easier to reference them as keys.

Would like to hear your perspectives on that.

@epoberezkin love your documentation on it by the way, very clear and easier to follow than the comment thread linked πŸ‘

@Anthropic these questions are probably out of scope of this issue... I just posted it here as we've discussed it, but I don't think select and $data that it requires would make it into the next draft-07.

The current focus here is to publish draft-06, then there are some other things that are higher priority probably.

@epoberezkin have you gotten any interesting feedback on if/then/else or select/selectCases/selectDefault since adding them to Ajv?

I'd like to move ahead with a PR here. In the absence of compelling feedback, I'd suggest we go with if/then/else and if there is still interest in select/selectCases/selectDefault track that elsewhere. But if select/selectCases/selectDefault has proven more useful then let's see if we still want if/then/else at all.

@handrews People use and ask questions about if/then/else. It is more generic so I think it should be added before select.

select requires $data support and in most real cases it can be implemented via several ifs (very verbose, but without $data), so I'd leave select until the next time regardless whether we add $data now or not.

Great, I'll do a PR for if/then/else. Thanks for all your work adding support for these ideas, having real usage feedback is tremendously helpful.

@Anthropic regarding your

considering the schema in it's data definition role (not just for validation)

For those who might not know, between that comment and now we started a project for proposing new JSON Schema vocabularies, including both a UI generation vocabulary and a code generation/data definition one (I think those go together? lmk if I'm confused).

I can't find where this has come up before, but I'd rather allow such new vocabularies to impose restrictions on how they are used with the validation vocabulary, and continue to support useful validation concepts even if they are difficult to impossible to use for data definition.

I think now that we are looking at these as separate vocabularies, it will be easier to explain imposing some restrictions such as "data definition implementations need not support if/then/else, not, *Of, dependencies, etc." We already sort of did this with the most recent Hyper-Schema revision, which excludes links defined under a "not" or within non-validating *Of branches from use. They're syntactically valid, but implementations MUST NOT attempt to do anything with them. There just aren't sensible semantics for those cases. I'll update that part of Hyper-Schema to also cover if/then/else in the PR.

dlax commented

Sorry for getting late in these discussions. I was wondering if having the if keyword within a (sub-)schema wouldn't make sense and solve some use cases in combination with oneOf or anyOf. For instance:

{
  "oneOf": [
     {
        "if": { "type": "object" },
        "properties": {
           "foo": {"type": "string"}
        }
     },
     {
        "if": { "type": "array" },
        "items": {"type": "string"}
     },
     { "$ref": "defaultCaseSchema" }
  ]
}

This way, we wouldn't need then and else keywords and this would naturally allow elif control. Has this been considered? Does it make sense?

@dlax It makes sense, but seems less intuitive than if/then/else. And the main point of if/then/else is to offer something intuitive. Everything it does can be done already. And implementing else-if-like (but unordered) logic just looks like this:

{
  "oneOf": [
     {
        "if": { "type": "object" },
        "then": {
          "properties": {
             "foo": {"type": "string"}
          }
        }
     },
     {
        "if": { "type": "array" },
        "then": {
          "items": {"type": "string"}
        }
     },
     { "$ref": "defaultCaseSchema" }
  ]
}

Which is really not awful.

Also, philosophically, nearly all keywords operate independently, such that:

{
  "x": "a",
  "y": "b"
}

is equivalent to

{
  "allOf": [
    {"x": "a"},
    {"y": "b"}
  ]
}

The exceptions are the additional* keywords and now if/then/else. Requiring the implementation of all keywords to be dependent on if breaks the paradigm too much. Being able to write little independent functions for the vast majority of keywords is one of JSON Schema's strengths.

dlax commented

Requiring the implementation of all keywords to be dependent on if breaks the paradigm too much. Being able to write little independent functions for the vast majority of keywords is one of JSON Schema's strengths.

Makes sense, thanks!

Merged PR #375

I'm trying out the syntax from this comment but I can't get it to work:

var Ajv = require('ajv');

var schema = {
	"properties": {
		"foo": {
			"type": "integer"
		},
		"pets": {
			"type": "array",
			"items": {
				"oneOf": [
					{
						"if": { "properties": { "type": { "const": "cat" } } },
						"then": { "$ref": "#/definitions/cat_pet" }
					},
					{
						"if": { "properties": { "type": { "const": "snake" } } },
						"then": { "$ref": "#/definitions/snake_pet" }
					}
				]
			}
		}
	},
	"definitions": {
		"cat_pet": {
			"type": "object",
			"properties": {
				"type": { "type": "string", "const": "cat" },
				"fur_color": { "type": "string", "enum": ["black", "white", "orange"], "default": "black" }
			}
		},
		"snake_pet": {
			"type": "object",
			"properties": {
				"type": { "type": "string", "const": "snake" },
				"overall_length": { "type": "integer", "minimum": 1 }
			}
		}
	}
};

data = {
	"foo": 123,
	"pets": [
		{
			"type": "cat",
			"fur_color": "white"

		},
		{
			"type": "snake",
			"overall_length": 42
		}
	]
};

var ajv = new Ajv();
require('ajv-keywords')(ajv, 'if');
var validate = ajv.compile(schema);
var valid = validate(data);
console.log(data);
if (!valid) console.log(validate.errors);

{ foo: 123,
  pets: 
   [ { type: 'cat', fur_color: 'white' },
     { type: 'snake', overall_length: 42 } ] }
[ { keyword: 'oneOf',
    dataPath: '.pets[0]',
    schemaPath: '#/properties/pets/items/oneOf',
    params: {},
    message: 'should match exactly one schema in oneOf' } ]

@AndreKR When "if" fails the whole schema passes. So in your case both subschemas in oneOf pass. What you miss is "else": false in both subschemas inside "oneOf".

@handrews that's actually the case where "select" with "$data" would have been nicer. So maybe let's try to get it in 08?

I also saw something about discriminator, is that still a thing? I didn't use it here because my condition schema will get a second field.

I don't think it ever was a thing... The problem with "discriminator" (the way it is defined in openapi) is that it does implicit mapping directly from property value to the schema key inside definitions, not relying on any existing conventions in JSON-Schema (e.g., such as $ref).

The problem it solves is real, but I'd rather we agree on the solution that is both more flexible (e.g. allows to map a sub-property, allows to map to any sub-schema, possibly in the different file) and aligned with the rest of the spec.

@handrews that's actually the case where "select" with "$data" would have been nicer. So maybe let's try to get it in 08?

Probably draft-09 will be the earliest $data will get considered given the level of controversial topics already attached to draft-08.

Please see #1082 for related work in this area.