- Shapiro
Shapiro is a simple ontology/vocabulary server serving turtle, json-ld, html or json-schema (as indicated by the requesting client in the accept-header). It therefore provides a simple approach to serving up an organization's ontologies/vocabularies.
Why would one need something like Shapiro? The basic idea is to model data as knowledge graphs using Turtle or JSON-LD and use these models in API definitions/implementations and all other code consuming data based on these models.
Make the use of these machine-readable model definitions pervasive throughout all phases of the software lifecycle (design, implement, test, release) and the lifecycle of the data originating from software built using these models.
Express non-functional requirements like security, traceability/lineage, data quality in the models and bind them to the instances of data wherever the data is distributed to and used.
Drive all documentation (model diagrams, documents, graph visualizations, etc.) from the same RDF-based model definition (a.k.a. ontology/knowledge graph).
Start out with providing a toolset from developers for developers for formulating such models and using them in source code, gradually extending towards tools, editors, UIs, transformations making this modelling approach accessible to non-technical actors like business analysts, domain data owners, etc.
In order to do so, you need a way to serve the models - this is where Shapiro comes in.
Shapiro serves schemas from a directory hierarchy in the file system (specified by the content_dir
parameter at startup). Shapiro will regularly check new or modified schemas for syntax errors and exclude such "bad schemas" from getting served. Schemas can be moved into Shapiro's content_dir while it is running. This decouples the lifecycle for schemas from the lifecycle of Shapiro - the basic idea being that the lifecycle of schemas is managed in some code repository where changes get pushed into Shapiro's content directory without Shapiro having to be restarted.
Shapiro will use the accept
header of the get request for a schema to determine the mime type of its response, independent of the format that Shapiro holds the schema in on its file system:
Request Accept Header & Response Mime Type | Implementation Status |
---|---|
application/ld+json |
implemented |
text/turtle |
implemented |
text/html |
implemented |
application/schema+json |
implemented |
application/json |
implemented (will return JSON-SCHEMA) |
If no accept header is specified, Shapiro will assume application/schema+json
as default, because many JSON-SCHEMA processors/validators do not properly set the accept header when resolving $ref URLs.
Shapiro converts Shacl nodeshapes into JSON-Schema and thereby integrates with JSON-Schema validation. Based on this, you can use the semantic datamodels served by Shapiro in your OpenAPI definitions (by way of $ref). An end to end example based on this OpenAPI tutorial can be found in test/openapi/tutorial.yaml
where the corresponding semantic model is at test/openapi/tutorial/artist.ttl
.
When rendering for mime type text/html
Shapiro will consider markdown in RDFS comments, SKOS definitions, DCT descriptions for improved readability of documentation.
Shapiro is opinionated about URL fragments for referring to terms in a schema - it plainly does not support them (here's why). So when writing your schema a.k.a. model a.k.a. vocabulary a.k.a. ontology, please ensure you refer to the individual terms it defines using the regular forward slash: e.g. http://myserver.com/myontology/term
instead of http://myserver.com/myontology#term
Shapiro allows you to keep schemas/ontologies in arbitrary namespace hierarchies - simply by reflecting namespaces as a directory hierarchy. This allows organizations to separate their schemas/ontologies across a hierarchical namespace and avoid any clashes. This also means you can have a more relaxed governance around the various ontologies/schemas across a collaborating community. The assumption is that you manage your schemas/ontologies in a code repository (Github, etc.) and manage releases form there onto a Shapiro instance serving these schemas in a specific environment (dev/test/prod).
Shapiro keeps the complete graph of all schemas combined in memory. The graph can be queried using the post request API /query
. This takes a SPARQL query (no updates) in the request body. That way you can query and mine the combined graph of all models.
Shapiro uses Whoosh Full-text-search to index all schemas it serves. Shapiro regularly checks for modified or new schemas in its content directory and indexes them.
Shapiro provides a minimal UI available at /welcome/
. Any GET
request to /
without a schema name to retrieve will also redirect to the UI. The ui lists all schemas served by Shapiro at a given point in time and allows to full-text-search schema content.
The Shapiro UI also renders models/schemas/ontologies as HTML.
Given the number of possibilities to use ontologies & vocabularies for your models, Shapiro can't anticipate them all. While I'm trying to keep Shapiro as open as possible and while Shapiro can serve any kind of ontology or vocabulary, HTML rendering of models and JSON-SCHEMA rendering of models work best if you keep the following in mind:
- Use RDFS for modelling your classes and properties. HTML rendering will work best with this vocabulary.
- Use RDFS labels that are acceptable object names resp. property names in programming languages (specifically when you use JSON-SCHEMA & OpenAPI in conjunction with schemas hosted by Shapiro)
- JSON-SCHEMA conversion requires your model defining NodeShapes with the appropriate SHACL properties and constraints. Shapiro will render empty schemas if you ask for JSON-SCHEMA of an RDFS class.
- Clone the Shapiro repository.
- Install dependencies:
pip install -r requirements.txt
- Run Shapiro Server:
python shapiro_server.py
with commandline parameters as per parameter reference - Access the UI at
http://localhost:8000/welcome/
- Access the API docs at
http://localhost:8000/docs
- Try
curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: application/ld+json'
to get JSON-LD from a schema in the content dir - Try
curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: text/turtle'
to get JSON-LD from a schema in the content dir.
Commandline Parameter Reference
Parameter | Description |
---|---|
--host |
The host for uvicorn to use. Defaults to 127.0.0.1 |
--port |
The port for the server to receive requests on. Defaults to 8000. |
--domain |
The domain that Shapiro uses to build its BASE_URL. Defaults to '127.0.0.1:8000' and is typically set to the domain name under which you deploy Shapiro. This is what Shapiro uses to ensure schemas are rooted on its server, to build links in the HTML docs and it's also the URL Shapiro uses to resolve static resources in HTML renderings. Include the port if needed. Examples: --domain schemas.myorg.com, --domain schemas.myorg.com:1234 |
--content_dir |
The content directory to be used. Defaults to "./". If you specify parameters for a GitHub user and repo, then this is the path of the content directory relative to the repository. If you're using GitHub to serve schemas from, this would be relative to the repository's root directory. |
--log_level |
The log level to run with. Defaults to "info" |
--default_mime |
The mime type to use if the requested mimetype in the accept header is not available or usable. Defaults to "text/turtle". |
--index_dir |
The directory where Shapiro stores the full-text-search indices. Default is ./fts_index |
--ssl_keyfile --ssl_certfile --ssl_ca_certs |
If these are set, Shapiro uses SSL. No defaults. |
--github_user --github_repo |
If these are set, Shapiro serves schemas from the content dir in this repo. |
--github_branch |
Set this to use a specific branch in your git hub repo (if github repo and github user parameters are specified). Defaults to the GitHub repo's default branch. |
--github_token |
The access token for the GitHub repo (if guthub repo and github user parameters are specified). If no value is specified, no authentication is used with GitHub (which will limit the number of requests that can be made through the API). |
Make sure you run python shapiro_server.py --help
for a full reference of command line parameters (host, port, domain, content dir, log level, default mime type, index directory, and if needed ssl-parameters).