python-openapi/openapi-schema-validator

How to resolve local references with jsonschema/referencing

wosc opened this issue ยท 12 comments

wosc commented

Hi, so I'm sorry for the probably dumb question, but I've been banging my head against this and am getting nowhere.

I have an openapi spec like this

components:
  schemas:
    something:
      type: array
      items:
        "$ref": "#/components/schemas/uuid"
    uuid:
      pattern: "^((\\{urn:uuid:)?([a-f0-9]{8})\\}?)$"
      type: string

Previously I could do validation like so:

schema = yaml.safe_load(open('openapi.yaml').read())
resolver = jsonschema.validators.RefResolver.from_schema(schema)
openapi_schema_validator.validate(
    ['{urn:uuid:deadbeef}'], 
    schema=schema['components']['schemas']['something'], 
    resolver=resolver)

But after the API switch to referencing, I'm stumped on how to create a Registry that "just" resolves these local pointers. I've tried following the guide in https://openapi-schema-validator.readthedocs.io/en/latest/references.html, like so

schema = yaml.safe_load(open('openapi.yaml').read())
definitions = schema['components']['schemas']
registry = referencing.Registry().with_resources([
    (f'urn:{k}-schema', referencing.Resource.from_contents(
        v, default_specification=DRAFT202012))
    for k, v in definitions.items()
])
openapi_schema_validator.validate(
    ['{urn:uuid:deadbeef}'],
    schema=schema['components']['schemas']['something'],
    registry=registry)

but that raises PointerToNowhere: '/components/schemas/uuid' does not exist within {'type': 'array', 'items': {'$ref': '#/components/schemas/uuid'}}

Thanks for your help!

Indeed not even the example on https://openapi-schema-validator.readthedocs.io/en/latest/references.html works.

jsonschema.exceptions._WrappedReferencingError: PointerToNowhere: '/components/schemas/Name' does not exist within {'type': 'object', 'required': ['name'], 'properties': {'name': {'$ref': '#/components/schemas/Name'}, 'age': {'$ref': '#/components/schemas/Age'}, 'birth-date': {'$ref': '#/components/schemas/BirthDate'}}, 'additionalProperties': False}

How should the connection from "#/components/schemas/Name" to "urn:name-schema" be established?

Guys, if you need WA for the current issue I did the following:

  • replaced #/components/schemas/Name with urn:Name in the schema
import collections.abc

def update(out, intput):
    for key, val in intput.items():
        if isinstance(val, collections.abc.Mapping):
            out,[key] = update(out.get(key, {}), val)
        elif key == "$ref":
            out[key] = val.replace("#/components/schemas/", "urn:")
        else:
            out,[key] = val
    return out,
schema = {}
update(schema, schema_with_PointerToNowhere)
  • populatedRegistry with these urns
resources = [(f"urn:{res_name}", Resource.from_contents(res, default_specification=DRAFT202012))
                 for res_name, res in schema['components']['schemas'].items()]
registry = Registry().with_resources(resources)
resources = [(f"#/components/schemas/{res_name}", Resource.from_contents(res, default_specification=DRAFT202012))
                 for res_name, res in schema['components']['schemas'].items()]
registry = Registry().with_resources(resources)

seems to work for me, without the replacement of "#/components/schemas/ -> urn:

If this is supported by Registry (I haven't really read through their docs well enough to know :)), maybe the example just needs updating?

edit: incorrect! see below

wosc commented

I'm sorry to say, @AlexeyKatanaev the idea you propose is exactly what the docs instruct to do, and what I've already tried to do (see issue description), but unfortunately this simply does not work (and it's not just me, @lawilog also confirmed it).

@Cynocracy thanks for the suggestion, but I've now tried f'#/components/schemas/{name}' and f'/components/schemas/{name}' and also both of those with an additional urn: prefix, but none of those work for me, unfortunately.

I've just now had a brief debugging session. It seems to me that #/components... uris are looked up in two stages. With my code example in the issue description, there's firstly a lookup on base_uri='', and then afterwards the fragment is tried to resolve as a pointer /components/schema/uuid in the Resource that was found there, which fails. Because it further seems to me that in the call of validate(['{urn:uuid:deadbeef}'], schema=schema['components']['schemas']['something'], registry=registry) the base_uri='' is already occupied by the value of the schema= parameter. So there is no chance at all to resolve these references, because of course the something definition does not also contain the referenced uuid definition, as that's the whole point of the reference in the first place.

So I'm sorry to say, I'm still completely stuck on how to migrate local references from RefResolver to referencing.Registry, and would appreciate any help. Thanks!

@wosc

Maybe a little more context to what I wrote helps?

I was taking the full apidoc for an openAPI service (saved to schema.json) and doing

from openapi_schema_validator import validate
from referencing import Registry, Resource
from referencing.jsonschema import DRAFT202012
import json

f = open('schema.json')
schema = json.load(f)

resources = [(f"#/components/schemas/{res_name}", Resource.from_contents(res, default_specification=DRAFT202012))
                 for res_name, res in schema['components']['schemas'].items()]
registry = Registry().with_resources(resources)

create_schema = schema["components"]["requestBodies"]["MYBODY"]["content"]["application/json"]["schema"]

from jsonschema import Draft202012Validator
validator = Draft202012Validator(
    create_schema,
    registry=registry,
)

validator.validate({ "foo": "bar"})

I agree that the inclusion of the urn reference in the thing being validated in the docs confused me, and I don't fully understand that bit :)

I've just now had a brief debugging session. It seems to me that #/components... uris are looked up in two stages. With my code example in the issue description, there's firstly a lookup on base_uri='', and then afterwards the fragment is tried to resolve as a pointer /components/schema/uuid in the Resource that was found there, which fails

Huh, now I'm curious how my example worked... looking at it a bit now ๐Ÿ•ต๏ธโ€โ™‚๏ธ

wosc commented

@Cynocracy Thanks for the full code example! I think understand your approach better now, namely validating against the whole schema. My uneducated guess from my short debugging session is that your example should still work even if you omit the with_resources() call.

Unfortunately, my use case is validating that the given data matches one part precisely, and not just "is valid under any possibilities given in the schema", e.g. in my simplified example I want to validate that the data is a list of uuid, and not a single uuid. So using the whole schema does not work there, I think.

@wosc Ah yeah, you're correct! It turns out the schema I was using already had $refs evaluated, so the registry was doing a whole lot of nothing.

edit: can also confirm that using the YAML with the $refs in it quickly becomes a headache, because my $refs also have $refs and that doesn't seem to be handled correctly, afaict, even with the find replace to urn:

wosc commented

Alright. I finally figured out how to this (using this for me somewhat cryptic off-hand comment as a starting point). The key point is that in the new API, you have to reference the part of the schema you want to validate against, instead of passing it as "the whole applicable schema" like one did previously. (And you have to assign a URI to the schema as a whole to be able to express this reference.) Thus, the updated/solved example code for the usecase in the issue description looks like this:

schema = yaml.safe_load(open('openapi.yaml').read())
resource = referencing.Resource.from_contents(schema, default_specification=DRAFT202012)
registry = referencing.Registry().with_resource(uri='urn:myschema', resource=resource)
openapi_schema_validator.validate(
    ['{urn:uuid:deadbeef}'],
    {'$ref': 'urn:myschema#/components/schemas/something'},
    registry=registry)

Dear maintainers, I would dearly love either a pointer to where I should have read about this, or for someone knowledgeable to update the library documentation. This has been an exceedingly frustrating exercise.

Hi @wosc

Thanks for your research. That's the exact document the references page is pointing to. I fixed references in the example. I'm happy to hear If you have any other suggestions how to improve the documentation.

wosc commented

@p1c2u Thanks for looking in. I think my main question is, what should I have been reading to make me think that it could be even remotely possible/sensible to pass a dict that only contains a $ref as the schema parameter to validate()?

  • All the examples on openapi-schema-validator.readthedocs.io pass a dict that contains, you know, schema data, i.e. something that talks about type and properties and the like.
  • In the previous API, this approach also worked just fine when the parameter only contained a "subschema" to validate against (see my initial code example in the description).
  • My only exposure to $ref so far has been "that's how you reuse definitions in openapi yaml files" (again see example in description).

@wosc thank you for figuring this out! I ran into the same problem last week and this thread was a lifesaver.

I think my main question is, what should I have been reading to make me think that it could be even remotely possible/sensible to pass a dict that only contains a $ref as the schema parameter to validate()?

Echoing this question, none of the documentation I found suggested that this was a possibility.

Indeed, this should be included in Migrating from RefResolver section of referencing docs.