/dc_schema

Generate JSON schema from python dataclasses

Primary LanguagePythonMIT LicenseMIT

dc_schema

CI codecov

Tiny library to generate JSON schema (2020-12) from python dataclasses. No other dependencies, standard library only.

pip install dc-schema 

Assumptions

  • python 3.9+

Motivation

Create a lightweight, focused solution to generate JSON schema from plain dataclasses. pydantic is a much more mature option, however it also does a lot of other things I didn't want to include here. Deepen my understanding of python dataclasses, typing and JSON schema.

Usage

Basics

Create a regular python dataclass and pass it to get_schema.

import dataclasses
import datetime
import json

from dc_schema import get_schema

@dataclasses.dataclass
class Book:
    title: str
    published: bool = False

@dataclasses.dataclass
class Author:
    name: str
    age: int
    dob: datetime.date
    books: list[Book]

print(json.dumps(get_schema(Author), indent=2))
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "title": "Author",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer"
    },
    "dob": {
      "type": "string",
      "format": "date"
    },
    "books": {
      "type": "array",
      "items": {
        "allOf": [
          {
            "$ref": "#/$defs/Book"
          }
        ]
      }
    }
  },
  "required": [
    "name",
    "age",
    "dob",
    "books"
  ],
  "$defs": {
    "Book": {
      "type": "object",
      "title": "Book",
      "properties": {
        "title": {
          "type": "string"
        },
        "published": {
          "type": "boolean",
          "default": false
        }
      },
      "required": [
        "title"
      ]
    }
  }
}

Annotations

You can use typing.Annotated + SchemaAnnotation to attach metadata to the schema, such as field descriptions, examples, validation (min/max length, regex pattern, ...), etc. Consult the code for full details.

import dataclasses
import datetime
import json
import typing as t

from dc_schema import get_schema, SchemaAnnotation

@dataclasses.dataclass
class Author:
    name: t.Annotated[str, SchemaAnnotation(title="Full name", description="The authors full name")]
    age: t.Annotated[int, SchemaAnnotation(minimum=0)]
    dob: t.Annotated[t.Optional[datetime.date], SchemaAnnotation(examples=["1990-01-17"])] = None

print(json.dumps(get_schema(Author), indent=2))
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "title": "Author",
  "properties": {
    "name": {
      "type": "string",
      "title": "Full name",
      "description": "The authors full name"
    },
    "age": {
      "type": "integer",
      "minimum": 0
    },
    "dob": {
      "anyOf": [
        {
          "type": "string",
          "format": "date"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "examples": [
        "1990-01-17"
      ]
    }
  },
  "required": [
    "name",
    "age"
  ]
}

To customize the metadata of a dataclass itself, use a SchemaConfig.

import dataclasses
import json

from dc_schema import get_schema, SchemaAnnotation

@dataclasses.dataclass
class User:
    name: str

    class SchemaConfig:
        annotation = SchemaAnnotation(title="System user", description="A user of the system")

print(json.dumps(get_schema(User), indent=2))
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "title": "System user",
  "description": "A user of the system",
  "properties": {
    "name": {
      "type": "string"
    }
  },
  "required": [
    "name"
  ]
}

Further examples

See the tests for full example usage.

CLI

dc_schema <file_path> <dataclass>

e.g.

dc_schema ./schema.py Author

Other tools

For working with dataclasses or JSON schema: