Materials-Data-Science-and-Informatics/metador-core

Evaluate, improve and document compatibility with JSON Schema

Opened this issue · 1 comments

Functionality

Documentation

  • Both directions should be explained in one of the tutorials, e.g. to the one explaining JSON-LD

Testing

  • Checking that all valid instances also validate for the JSON Schema should be part of tests (#44)

@broeder-j

example input:

from __future__ import annotations

from typing import Literal, List, Union, Optional
from metador_core.schema.core import MetadataSchema
from metador_core.schema.ld import ld

@ld(context="https://www.schema.org")
class ModelA(MetadataSchema):
    """Model A"""
    foo: str
    bar: Optional[int]

class ModelB(ModelA):
    """Model B (subclass of Model A)"""

    bar: Literal[5]
    qux: float

class ModelC(MetadataSchema):
    """Model C (contains object of Model B)"""

    quux: bool
    blub: List[ModelB]
    bla: ModelA


class ModelD(ModelC):
    "subclass of Model D"

    class Config:
        extra = "forbid"

exported JSON Schema test.schema.json:

{
  "title": "ModelD",
  "description": "subclass of Model D",
  "type": "object",
  "additionalProperties": false,
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "allOf": [
    {
      "$ref": "#/$defs/71ffe60db66eb197"
    },
    {}
  ],
  "$metador_hash": "ea6634d2659b8ab2",
  "$defs": {
    "5f1c8d6cd5671a9c": {
      "title": "ModelB",
      "description": "Model B (subclass of Model A)",
      "type": "object",
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$metador_plugin": {
        "name": "blub",
        "version": [
          0,
          1,
          1
        ]
      },
      "$metador_constants": {
        "@context": "https://www.schema.org"
      },
      "allOf": [
        {
          "$ref": "#/$defs/fc1b2f2509ac7477"
        },
        {
          "properties": {
            "bar": {
              "title": "Bar",
              "enum": [
                5
              ],
              "type": "integer"
            },
            "qux": {
              "title": "Qux",
              "type": "number"
            }
          },
          "required": [
            "bar",
            "qux"
          ]
        }
      ],
      "$metador_hash": "5f1c8d6cd5671a9c"
    },
    "fc1b2f2509ac7477": {
      "title": "ModelA",
      "description": "Model A",
      "type": "object",
      "properties": {
        "foo": {
          "title": "Foo",
          "type": "string"
        },
        "bar": {
          "title": "Bar",
          "type": "integer"
        },
        "@context": {}
      },
      "required": [
        "foo"
      ],
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$metador_constants": {
        "@context": "https://www.schema.org"
      },
      "$metador_hash": "fc1b2f2509ac7477"
    },
    "71ffe60db66eb197": {
      "title": "ModelC",
      "description": "Model C (contains object of Model B)",
      "type": "object",
      "properties": {
        "quux": {
          "title": "Quux",
          "type": "boolean"
        },
        "blub": {
          "title": "Blub",
          "type": "array",
          "items": {
            "$ref": "#/$defs/5f1c8d6cd5671a9c"
          }
        },
        "bla": {
          "$ref": "#/$defs/fc1b2f2509ac7477",
          "description": "Model A"
        }
      },
      "required": [
        "quux",
        "blub",
        "bla"
      ],
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$metador_hash": "71ffe60db66eb197"
    }
  }
}

Run generator with

datamodel-codegen --use-annotated --use-schema-description --use-title-as-name --strict-types {str,bytes,int,float,bool} --strip-default-none --base-class metador_core.schema.core.MetadataSchema --input-file-type=jsonschema --input test.schema.json

Result:

# generated by datamodel-codegen:
#   filename:  test.schema.json
#   timestamp: 2022-10-07T12:06:16+00:00

from __future__ import annotations

from enum import Enum
from typing import Annotated, Any, List, Optional

from pydantic import Extra, Field, StrictBool, StrictFloat, StrictInt, StrictStr

from metador_core.schema.core import MetadataSchema


class Bar(Enum):
    integer_5 = 5


class ModelA(MetadataSchema):
    """
    Model A
    """

    foo: Annotated[StrictStr, Field(title='Foo')]
    bar: Annotated[Optional[StrictInt], Field(title='Bar')]
    _context: Annotated[Optional[Any], Field(alias='@context')]


class ModelB(ModelA):
    """
    Model B (subclass of Model A)
    """

    bar: Annotated[Bar, Field(title='Bar')]
    qux: Annotated[StrictFloat, Field(title='Qux')]


class ModelC(MetadataSchema):
    """
    Model C (contains object of Model B)
    """

    quux: Annotated[StrictBool, Field(title='Quux')]
    blub: Annotated[List[ModelB], Field(title='Blub')]
    bla: Annotated[ModelA, Field(description='Model A')]


class ModelD(ModelC):
    """
    subclass of Model D
    """

    pass

    class Config:
        extra = Extra.forbid

Observations:

  • in general, it seems to work
  • surprisingly, it even picks up inheritance
  • of course, it does not reconstruct the original type annotations
  • of course, it cannot reconstruct the decorators etc.
  • of course, it does not know the type hints we use in metador and uses default hints

Summary:

  • a full 1-to-1 reconstruction is of course not possible (even though it would be cool), but it does recover substructures, which is good
  • a better mapping to Metador and phantom type hints seems like a thing one could try hacking into it (like the strict types option)
  • as is, it still produces a kind of useful draft (that must be worked over manually to make the schemas "nice")