/shapiro

Modelling data with JSON-LD, Turtle, SHACL

Primary LanguageCSSApache License 2.0Apache-2.0

Unit Tests Python 3.8 Unit Tests Python 3.9 Unit Tests Python 3.10 Unit Tests Python 3.11 Unit Tests Python 3.12 Coverage Last Commit Release Date

Shapiro Shapiro

What is Shapiro

Shapiro is a simple ontology/vocabulary server serving turtle, json-ld, html or json-schema (as indicated by the requesting client in the accept-header). It therefore provides a simple approach to serving up an organization's ontologies/vocabularies.

Motivation - Model as Code

Why would one need something like Shapiro? The basic idea is to model data as knowledge graphs using Turtle or JSON-LD and use these models in API definitions/implementations and all other code consuming data based on these models.

Make the use of these machine-readable model definitions pervasive throughout all phases of the software lifecycle (design, implement, test, release) and the lifecycle of the data originating from software built using these models.

Express non-functional requirements like security, traceability/lineage, data quality in the models and bind them to the instances of data wherever the data is distributed to and used.

Drive all documentation (model diagrams, documents, graph visualizations, etc.) from the same RDF-based model definition (a.k.a. ontology/knowledge graph).

Start out with providing a toolset from developers for developers for formulating such models and using them in source code, gradually extending towards tools, editors, UIs, transformations making this modelling approach accessible to non-technical actors like business analysts, domain data owners, etc.

In order to do so, you need a way to serve the models - this is where Shapiro comes in.

Serving Schemas

Shapiro serves schemas from a directory hierarchy in the file system (specified by the content_dirparameter at startup). Shapiro will regularly check new or modified schemas for syntax errors and exclude such "bad schemas" from getting served. Schemas can be moved into Shapiro's content_dir while it is running. This decouples the lifecycle for schemas from the lifecycle of Shapiro - the basic idea being that the lifecycle of schemas is managed in some code repository where changes get pushed into Shapiro's content directory without Shapiro having to be restarted.

Content Negotiation

Shapiro will use the accept header of the get request for a schema to determine the mime type of its response, independent of the format that Shapiro holds the schema in on its file system:

Request Accept Header & Response Mime Type Implementation Status
application/ld+json implemented
text/turtle implemented
text/html implemented
application/schema+json implemented
application/json implemented (will return JSON-SCHEMA)

If no accept header is specified, Shapiro will assume application/schema+json as default, because many JSON-SCHEMA processors/validators do not properly set the accept header when resolving $ref URLs.

Integration with OpenAPI & JSON-Schema

Shapiro converts Shacl nodeshapes into JSON-Schema and thereby integrates with JSON-Schema validation. Based on this, you can use the semantic datamodels served by Shapiro in your OpenAPI definitions (by way of $ref). An end to end example based on this OpenAPI tutorial can be found in test/openapi/tutorial.yaml where the corresponding semantic model is at test/openapi/tutorial/artist.ttl.

Markdown in RDFS Comments/SKOS Definitions/DCT Descriptions

When rendering for mime type text/html Shapiro will consider markdown in RDFS comments, SKOS definitions, DCT descriptions for improved readability of documentation.

No URL fragments

Shapiro is opinionated about URL fragments for referring to terms in a schema - it plainly does not support them (here's why). So when writing your schema a.k.a. model a.k.a. vocabulary a.k.a. ontology, please ensure you refer to the individual terms it defines using the regular forward slash: e.g. http://myserver.com/myontology/term instead of http://myserver.com/myontology#term

Hierarchical Namespaces

Shapiro allows you to keep schemas/ontologies in arbitrary namespace hierarchies - simply by reflecting namespaces as a directory hierarchy. This allows organizations to separate their schemas/ontologies across a hierarchical namespace and avoid any clashes. This also means you can have a more relaxed governance around the various ontologies/schemas across a collaborating community. The assumption is that you manage your schemas/ontologies in a code repository (Github, etc.) and manage releases form there onto a Shapiro instance serving these schemas in a specific environment (dev/test/prod).

Querying the combined Graph of all Schemas served by Shapiro

Shapiro keeps the complete graph of all schemas combined in memory. The graph can be queried using the post request API /query. This takes a SPARQL query (no updates) in the request body. That way you can query and mine the combined graph of all models.

Searching Shapiro

Shapiro uses Whoosh Full-text-search to index all schemas it serves. Shapiro regularly checks for modified or new schemas in its content directory and indexes them.

Shapiro UI

Shapiro provides a minimal UI available at /welcome/. Any GETrequest to / without a schema name to retrieve will also redirect to the UI. The ui lists all schemas served by Shapiro at a given point in time and allows to full-text-search schema content. The Shapiro UI also renders models/schemas/ontologies as HTML.

Writing Semantic Models to be served by Shapiro

Given the number of possibilities to use ontologies & vocabularies for your models, Shapiro can't anticipate them all. While I'm trying to keep Shapiro as open as possible and while Shapiro can serve any kind of ontology or vocabulary, HTML rendering of models and JSON-SCHEMA rendering of models work best if you keep the following in mind:

  • Use RDFS for modelling your classes and properties. HTML rendering will work best with this vocabulary.
  • Use RDFS labels that are acceptable object names resp. property names in programming languages (specifically when you use JSON-SCHEMA & OpenAPI in conjunction with schemas hosted by Shapiro)
  • JSON-SCHEMA conversion requires your model defining NodeShapes with the appropriate SHACL properties and constraints. Shapiro will render empty schemas if you ask for JSON-SCHEMA of an RDFS class.

Installing Shapiro

  1. Clone the Shapiro repository.
  2. Install dependencies: pip install -r requirements.txt

Running Shapiro

  1. Run Shapiro Server: python shapiro_server.py with commandline parameters as per parameter reference
  2. Access the UI at http://localhost:8000/welcome/
  3. Access the API docs at http://localhost:8000/docs
  4. Try curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: application/ld+json' to get JSON-LD from a schema in the content dir
  5. Try curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: text/turtle' to get JSON-LD from a schema in the content dir.

Commandline Parameter Reference

Parameter Description
--host The host for uvicorn to use. Defaults to 127.0.0.1
--port The port for the server to receive requests on. Defaults to 8000.
--domain The domain that Shapiro uses to build its BASE_URL.
Defaults to '127.0.0.1:8000' and is typically set to the domain name under which you deploy Shapiro.
This is what Shapiro uses to ensure schemas are rooted on its server, to build links in the HTML docs and it's also the URL Shapiro uses to resolve static resources in HTML renderings.
Include the port if needed. Examples: --domain schemas.myorg.com, --domain schemas.myorg.com:1234
--content_dir The content directory to be used. Defaults to "./". If you specify parameters for a GitHub user and repo, then this is the path of the content directory relative to the repository. If you're using GitHub to serve schemas from, this would be relative to the repository's root directory.
--log_level The log level to run with. Defaults to "info"
--default_mime The mime type to use if the requested mimetype in the accept header is not available or usable. Defaults to "text/turtle".
--index_dir The directory where Shapiro stores the full-text-search indices. Default is ./fts_index
--ssl_keyfile
--ssl_certfile
--ssl_ca_certs
If these are set, Shapiro uses SSL. No defaults.
--github_user
--github_repo
If these are set, Shapiro serves schemas from the content dir in this repo.
--github_branch Set this to use a specific branch in your git hub repo (if github repo and github user parameters are specified). Defaults to the GitHub repo's default branch.
--github_token The access token for the GitHub repo (if guthub repo and github user parameters are specified). If no value is specified, no authentication is used with GitHub (which will limit the number of requests that can be made through the API).

Make sure you run python shapiro_server.py --helpfor a full reference of command line parameters (host, port, domain, content dir, log level, default mime type, index directory, and if needed ssl-parameters).