Implementors Feedback on current Alternative Schemas Draft Proposal
philsturgeon opened this issue ยท 5 comments
A lot of people came together to make a draft implementation, that went through several iterations, and it looks good on paper. I want to thank everyone for their hard work in making this happen. Unfortunately when you sit down to try actually implementing this it becomes clear we have made something really hard to implement, which handles a myriad of corner cases, but in doing so might now be failing to cover the mainstream requirements.
First, a quick recap on the problems I wanted this proposal to to solve, then the summary gets into some potential alternatives to the current alternative schemas draft.
Context
One of the problems I most commonly see users and tooling vendors struggling with is this, that they think they are getting JSON Schema, but they are really getting some awkward subset, sideset, superset, known jovially as OpenAPI-flavoured JSON Schema. It has missing bits, extra bits, and functionally different bits.
If this was only a problem for beginners it would be bad enough, but the experts are getting stuck too. When I was pushing OpenAPI at WeWork, pretty much everyone would get stuck on this, to a point where I had to make workarounds and abstractions in the tooling. It is one of the most common gotcha coming up on the APIs You Won't Hate Slack group, and my two articles on the topic are some of my highest ranked in traffic.
It's not just beginners though, tooling vendors like Amazon AWS Gateway are confused. They say they support OpenAPI v3.0 but they do not allow the nullable keyword, instead they want type: ['string', 'null']
, showing... they do not actually support OpenAPI v3.0 they support JSON Schema with the wrong name.
Not to toot our horns too much, but Stoplight have some really smart developers, and we are all getting stuck with this. We have an AST which normalizes whatever input formats to JSON Schema Draft 7 internally, and then various other bits of the tooling need to try and remember which schema flavour to convert back to, and the development of our next generation of tools has hit a few ruts trying to do that perfectly.
This is a common problem all over the place, and it's getting embarrassing for me when I recommend OpenAPI and people say "yeah but what about..." I'd like to get it solved, which I why I proposed some solutions and hassled Darrel about getting this done, which he graciously worked really hard on!!
I built a workaround for people with Speccy, which munges actual JSON Schema into OpenAPI-flavoured JSON Schema using a JS module. That logic can go at various places in the life-cycle, but it would be a lot better for the community to have an official, baked-in OpenAPI solution.
Whatever solution we come up with should either a) let people write pure JSON Schema that will work in a standard JSON Schema validator, and also be readable by OpenAPI tooling, or b) provide some sort of switch for folks to say "this is JSON Schema" or "This is OpenAPI Schema", so that sure there are two ways but now people have the option, and tooling complexity is just "add an if and normalize this for them".
Either way would be ok, but OpenAPI v3.1 is not going to get that. I think people think it does, but it does not.
1.) Bump JSON Schema to Draft 7 in OpenAPI v3.1
This was discussed, sounded great, but never got anywhere due to some reasoning
against which never made any sense to me.
Pro: All schema
keywords continue to be OpenAPI Schema Objects, but based
off of Draft 7 instead of Draft 4, supporting type arrays. Parsers, validators, etc have no new keywords to look out for other than that type might be an array now, which is a pretty simple fix for them to implement if they need to, but many tools will just need to load in the newest OpenAPI v3.1 metaschema, and not even need to change their own code.
Pro: Keep discriminator, keep all the special OpenAPI keywords, you can even add support for exclusiveMinimum: true
as a modifier to minimum: <num>
whilst also supporting exclusiveMinimum: <num>
, which is the biggest breaking change between 4 and 7. This would make OpenAPI v3.1 fully able to read a JSON Schema Draft 7 document, without breaking any functionality whatsoever for users who
Schema even is. Nobody has to change anything in any of their files, unless they want to use newer JSON Schema features.
Con?: Folks would need to update their JSON Schema validators...? Only
maybe. Most maintained JSON Schema validators added support for draft 7 a while ago, so most people will already have a validator that supports it. They might have had to add an extra package to bring back old draft 4 support to their new validator, which they can just disable now. Even if we do need to suggest folks update their JSON Schema validator... meh? Don't use dependencies that are several years old? Dependabot
is great.
Con?: An argument against I heard on the TSC is the one that apparently ended the conversation, which is this. If somebody is validating their OpenAPI v3.0 in a pure JSON Schema validator then today it would break, but then OpenAPI v3.1 based files might suddenly start working, and that would somehow be a breaking change? Did I get that right because it sounds like silly to me. Hopefully I just misunderstood.
Con?: The final argument against actual JSON Schema being brought into
OpenAPI v3.1 is the inability for strictly typed languages to handle "type arrays" (type: ["string", "null"]
) and more dynamic things like if/then/else. The reason is that the type array is basically a Union Type, and that causes problems for C or whatever, but hobbling a description format for a subset of its users sounds like a terrible way to go. If a team or a company has opinions about how OpenAPI should be written, they should write a style guide for their company using Spectral rulesets, and I'll give them a free demo on how to do that. Screwing up compatibility of these two formats just because code generation might be a bit harder in some languages feels shortsighted to me.
This would be great, but the nays took it, so an alternative was born.
2.) Alternative Schemas
The first discussions about alternative schema has it as a sibling to schema, which could be used instead of schema, and had a very clear switch.
type: object ...
x-oas-draft-alternate-schema:
schemaType: json-schema schemaRef: ./real-json-schema.json
This could have been tricky as there was potentially two sources if truth, but it would have solved the problem in that people were able to write actual JSON Schema and have actual JSON Schema tools interacting with their specs. Tooling could chose to use one or the other, and we could have worded into the spec "if tooling understands an alternative schema, it may use that over the schema", and folks would just have to not write two things differently.
Things progressed a lot in the merged draft
proposal, and now this is all possible:
Minimalist usage of alternative schema:
schema:
x-oas-draft-alternativeSchema:
type: jsonSchema location: ./real-jsonschema.json
Combination of OAS schema and alternative:
schema:
type: object nullable: true x-oas-draft-alternativeSchema:
type: jsonSchema location: ./real-jsonschema.json
Multiple different versions of alternative schema:
schema:
anyOf:
- x-oas-draft-alternativeSchema:
type: jsonSchema location: ./real-jsonschema-08.json
- x-oas-draft-alternativeSchema:
type: jsonSchema location: ./real-jsonschema-07.json
Combined alternative schemas:
schema:
allOf:
- x-oas-draft-alternativeSchema:
type: xmlSchema location: ./xmlSchema.xsd
- x-oas-draft-alternativeSchema:
type: schematron location: ./schema.sch
Mixed OAS schema and alternative schema:
schema:
type: array items:
x-oas-draft-alternativeSchema:
type: jsonSchema location: ./real-jsonschema.json
I was lucky enough to have Darrel Miller jump on a 1 hour call with me to help me plan out exactly how this could all be implemented, and whilst I felt convinced at the time, going back to trying to implement all this is not only going to be incredibly hard, but it does not solve the initial requirements.
Minimalist usage looks fine, it looks like the original example. The anyOf and allOf also look complicated but they could still make sense for allowing pure JSON Schema, XML Schema, etc. files separately. I am not sure who exactly would be referencing multiple draft examples, because the goal for most people I speak to is: a single source of truth, not: multiple sources of truth which are hopefully the same but using ever-so-slightly different keywords.
The two examples labeled "Mixed OAS schema and alternative schema" and "Combination of OAS schema and alternative" look terrifying, because now we have lost the concept of a pure JSON Schema file.
Darrel tried to placate my fears saying "They are just more validation rules" but the implementation of this gets really tough. Instead of being able to use JSON Schema validators directly on the schema, we have to use alternativeSchema aware validators which are capable of reading JSON Schema rules, in order to essentially convert it to OpenAPI in order to run validation.
Beyond that the real schema might only be partial, or have competing rules that disagree with the above. What happens when inside real-jsonschema.json the type is not object, or null is not allowed.
schema:
type: object nullable: true x-oas-draft-alternativeSchema:
type: jsonSchema location: ./real-jsonschema.json
We now have a weird scenario where a JSON Schema file was used in response contract testing, has been confirmed to be correct against the JSON instance, but is being used for documentation in which it is saying something totally different due to user error, a wonky merge, you name it. Splitting the source of truth is no fun.
Apparently this is done because it allows use cases like a JSON instance which carries a string, and that string might be XML following some XML Schema... ๐
Who is doing this?
There is such a thing as supporting so many requirements you solve none of them well, and I am afraid to say I personally think we found it with this proposal.
3.) Alternative Alternative Schemas
Can we take a step back and try something else, even if just a thought exercise?
The alternativeSchema
keyword is a distinct keyword, an alternative to the schema
keyword, and can be used anywhere the schema keyword can be. Sure you probably don't want to XML Schema your parameters, but maybe you do, whatever, consistency.
It would look like this:
- type: openApi
location: ./converted/models/foo.yaml
- type: jsonSchema
location: ./schemas/foo.json
This would write everything as JSON Schema proper, have a build step which converted JSON Schema to OpenAPI schema. It would let me use one or the other separately, or use both together to work on the widest array of tooling which might not have proper JSON Schema support yet.
Implementors can easily catch up on OpenAPI v3.1 with const schema = object.schema || object.schemas.find(s => s.type === 'openApi')
and they fully support the openApi type.
When my entire tool-chain supports type: jsonSchema
I can just remove the openApi type ones, or remove that handler if the tooling is configurable.
The order is important, it will go through them all, if you say 3 schemas are important then its gonna validate as many as it can.
- type: xmlSchema
location: ./xmlSchema.xsd
- type: schematron
location: ./schema.sch
Validation here again will go through both and do both, giving back any errors it finds. The validator can decide how it handles de-duping problems.
Mock and Doc tools have a different approach, they want to construct an example out of these multiple schemas. They can do this by looking at the types available.
Got an XML Schema? Great, lets try and create an example. Only got a schematron? Thats just validation rules (apparently I dont know anything about schematron) stuff that never mind.
Prism (a mock server) would probably look for JSON Schema / OpenAPI if working with JSON, or look for XML Schema / OpenAPI if working with XML.
Summary
it would be amazing if we could just go with solution 1 for OpenAPI v3.1, and revisit alternative schemas properly for OpenAPI v4.0. This could let us have minimal change for the v3.x branch, keep tooling really similar, avoid more tool lag, and save bigger features for bigger versions. Heck we could even attack gRPC (with Protobuf) as part of it.
I just don't know how many potential users are sitting around struggling with OpenAPI+XML who would jump at this chance to fully XML Schema + Schematron their
stuff.
I also don't know many API Design tooling vendors that would jump at the chance
to support XML Schema this way. While that is some anecdotal educated guesswork, the bread and butter of these companies is JSON (and working on adding GraphQL, gRPC and every other trendy flavour) not running back for XML Schema. The demand isnt there.
Looking at the issues for the last two years, there are a lot more people showing confusion with the discrepancy, and very clearly a bunch of people working with JSON. There are a bunch of issues relating to the discrepancy, found after only a quick browse through the most recent few pages.
- type array with null could be equivalent to nullable OAI/OpenAPI-Specification#1389
- Allow $id and $schema OAI/OpenAPI-Specification#1719 & OAI/OpenAPI-Specification#1523
- Add $const OAI/OpenAPI-Specification#1666
- Support the JSON Schema
examples
keyword OAI/OpenAPI-Specification#1494
There is this one person who wants to describe XML (#1303) and @handrews showed them they already can. This one person is not going to make the pain of this alternative schema implementation worthwhile for the big companies, let alone the indy tooling vendors.
If we absolutely have to support alternative schemas right now, can we please reconsider the approach, and try to go with something a little closer to option 3?
We need to be able to let users confidently reference a real JSON Schema file, and have a plain old JSON Schema validator run on the data instance. Anything which cannot do that, to me, is conceptually broken.
I only have my own anecdotal evidence to support moving to an upgraded json schema. Similarly we have needed to build software around the differences.
Also, I can imagine for the openapi project, delegating decisions on the schema to json-schema project will relieve thinking time to focus on other areas. For instance, you may imagine at some point to upgrade to draft-8, 9 or whatever comes and all the thinking on those new versions would be done by other group.
Implementors of openapi also will be able to rely on json-schema libraries properly and that will relieve work there also.
In general sounds like a win-win to me. The small cons are unfortunate but I'd say fix it now and go for full json-schema compliance rather sooner than later.
More troubles.
ReDoc openapi-sampler is adding a lot of JSON Schema draft 6-only keywords because people are always trying to use them and get confused when they cannot. Tnis is great so long as all your tooling is using openapi-sampler or something else which understands JSON Schema keywords, but the second you use a tool which does not you have invalid OpenAPI. You also might have invalid JSON Schema because there are OpenAPI keywords in it. Redocly/openapi-sampler#10
Speccy users are confused about propertyNames not working in their OpenAPI stuff. Thats another JSON Schema keyword which OpenAPI does not support, but people think it should because OpenAPI is JSON Schema right? wework/speccy#293
Some other chap thanked me for my work to try and resolve the divergence just 40 minutes ago, because he is using JSON Schema proper and trying to use OpenAPI as glue too.
I spend a lot of my day discussing the subject of using proper JSON Schema in OpenAPI. My coworkers are confused, apisyouwonthate.com people are confused on slack, Twitter people are confused, everyone is confused.
Update here, is that I think v3.1 should focus on catching up with modern JSON Schema to become a superset, not an extended subset superset: OAI/OpenAPI-Specification#1977
We can make alternative schema awesome for v4.0 IMO.
What work is needed to help push this proposal into 3.2? Just the basic idea of referencing an XML Schema instead of a component model.
@BlueCoder77 the original post proposes "Alternative Alternative Schemas" and we need to move forward with that idea. If you could update the proposal to look like that, and try slapping together an example repo, that'd be a good first step.