🧹 Clarification: Behavior for duplicate `$id`s or `$anchor`s

Question

🧹 Clarification: Behavior for duplicate `$id`s or `$anchor`s

gregsdennis opened this issue a year ago · 10 comments

Specification section

Core, probably section 8.2 somewhere. (EDIT: Also section 9.1.2 as noted below.)

What is unclear?

A question came up in Slack for the appropriate behavior when duplicate $id were found in subschemas. For example:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://test.example/a",
  "type": "object",
  "properties": {
    "f1": {
      "$id": "https://test.example/b",
      "type": "string"
    },
    "f2": {
      "$id": "https://test.example/b",
      "type": "number"
    },
    "f3": {
      "$ref": "https://test.example/b"
    }
  }
}

The ID https://test.example/b is declared twice.

This could also apply if multiple $anchor values were duplicated within the same resource.

Proposal

This should be a "refuse to process" error.

Currently the behavior is undefined, which means an implementation can do what it wants.

Do you think this work might require an [Architectural Decision Record (ADR)]? (significant or noteworthy)

No

Answer 1 · 2024-08-17T21:05:18.000Z

Having the same identifier for multiple schemas is non-sense. It's like having JSON with duplicate keys ({ "a": 1, "a": 2 } what is the value of a?). Just like JSON, the behavior in this situation is undefined. Defining a correct behavior for something non-nonsensical would be a bad design choice IMO. It artificially constraints implementations for no good reason. This could lead to less efficient implementations because they are trying to conform to an arbitrary constraint.

Answer 2 · 2024-08-17T23:09:28.000Z

It artificially constraints implementations for no good reason.

We have a very good reason, and you stated it: it's nonsense. A duplicate ID cannot be resolved in a reliable way. Leaving this behavior undefined means that implementations can do what they want, which is not interoperable.

The only way to ensure interoperability is to specify some behavior. Since duplicate IDs is nonsense, an implementation should error.

It's like having JSON with duplicate keys ({ "a": 1, "a": 2 } what is the value of a?). Just like JSON, the behavior in this situation is undefined.

Unlike duplicate keys, though, we have the ability to define behavior for duplicate IDs. We can't enforce behavior for duplicate keys because that's down the stack.

This could lead to less efficient implementations because they are trying to conform to an arbitrary constraint.

I would like to see examples where such enforcement would cause this sort of problem. I'm not interested in optimizing on hypotheticals.

An implementation should tell a user when they've done something wrong. Duplicating IDs is definitely wrong.

Answer 3 · 2024-08-26T21:17:15.000Z

I might be missing something here, but the spec does seem to have some wording already on this matter:

When multiple schemas try to identify as the same URI, validators SHOULD raise an error condition

Answer 4 · 2024-08-26T21:32:59.000Z

Probably not a great thing that these are in different sections. These should be aligned, if not consolidated.

I also think this should be elevated to a MUST. To @jdesrosiers's point, though, it doesn't make sense to require a scan for duplicate IDs. I think a happy medium might be "error when encountered".

Answer 5 · 2024-08-27T22:36:55.000Z

When multiple schemas try to identify as the same URI, validators SHOULD raise an error condition

I think SHOULD is appropriate because what it means to "raise an error condition" is not defined well enough that it would provide any interoperability benefit. The reason it exists is mostly just to point out that contradictory definitions are nonsensical. This ought to be self-evident: If you have two different definitions for a given URI, then you're faced with the fact that one of them must be "wrong"—and this calls into question your whole application.

There's also the possibility implementations might have a good justification to handle this scenario differently. For example, an implementation might have some knowledge that one schema is an "older" version of another, and so it knows how to pick the "newer" version. SHOULD already means "you have to do it this way... unless you have a really good reason to do so otherwise" and I can't think of any interoperability reasons to prohibit something like this.

Answer 6 · 2024-08-28T01:26:39.000Z

Thanks @awwright. I'm convinced by that.

I think the action here is to reference the requirement in 9.1.2 from 8.2, if even just as a note.

Answer 7 · 2024-09-30T22:14:56.000Z

Looking through Core 8.2.1, and it looks like this was already addressed by Henry in 2022 (bdbf918), before the conversion to markdown:

Duplicate schema identifiers {#duplicate-iris}

A schema MAY (and likely will) have multiple IRIs, but there is no way for an IRI to identify more than one schema. When multiple schemas attempt to identify as the same IRI through the use of $id, $anchor, $dynamicAnchor, or any other mechanism, implementations SHOULD raise an error condition. Otherwise the result is undefined, and even if documented will not be interoperable.

(There's also a note in 8.2.1 that links to this section.)

I think that resolves this. Unless I hear otherwise, I'll close this in few days.

Answer 8 · 2024-10-01T22:10:59.000Z

This is a breaking change. I think that means it should have a TSC discussion and vote to decide if we're ok with that. Henry's added that before we decided to take breaking changes very seriously. If we don't want to allow this breaking change, it might be necessary to back out Henry's change.

Answer 9 · 2024-10-02T06:55:46.000Z

I disagree that this is a change at all.

Henry created #1271 and referenced two sections of the specification that listed the same behavior at slightly different requirement levels (MAY and SHOULD, respectively). The new text that Henry added uses the stronger of the two requirements, but the requirement itself hasn't changed.

Furthermore, Henry subsequently created #1272, where this change was thoroughly discussed by you and @notEthan, and ultimately approved by you.

(To be clear, I think this issue can be closed as what I raised has already been addressed.)

Answer 10 · 2024-10-03T20:30:32.000Z

Sorry, I misread the text you quoted. I thought the change was making it a MUST.