Designed by Agile Lab, witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. It enables businesses to discover, enhance, and productize their data, fostering the creation of automated data platforms that adhere to the highest standards of data governance. Want to know more about witboost? Check it out here or contact us!

This repository is part of our Starter Kit meant to showcase witboost's integration capabilities and provide a "batteries-included" product.

DCAT OWL Adapter

Overview

This project provides a prototype for integrating into Witboost whatever data catalog supports DCAT and OWL in an RDF triple store.

This integration supports two different operations:

At build time: Business Terms retrieving: we can retrieve the business terms from the reference ontology by querying the SPARQL endpoint and pushing down filters like the domain or other specific context
At deploy time: Push in the knowledge graph all the data contracts defined in Witboost, linking them with the ontology and adding all the needed metadata

For this experiment, we used the FIBO Ontology, downloading it in the RDF triple store and using it to facilitate the tagging of dataset with business domain-specific terms.

Here is an example of the result

This prototype creates an embedded RDF triple store and is populated with FIBO ontology for testing purposes.

What's a Tech Adapter?

A Tech Provisioner is a microservice that is in charge of integrating a target technology in a bi-directional way. When the deployment of a Data Product is triggered, the platform generates its descriptor and orchestrates the deployment of every component contained in the Data Product. For every such component the platform knows which Tech Adapter is responsible for its deployment, and can thus send a provisioning request with the descriptor to it so that the Tech Adapter can perform whatever operation is required to fulfill this request and report back the outcome to the platform.

You can learn more about how the Specific Provisioners fit in the broader picture here.

Software stack

This microservice is written in Python 3.11, using FastAPI for the HTTP layer. Project is built with Poetry and supports packaging as Wheel and Docker image, ideal for Kubernetes deployments (which is the preferred option).

Building

Requirements:

Python 3.11.x (this is a strict requirement as of now, due to uvloop 0.17.0)
Poetry

Install Python 3.11:

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
python3.11 --version
which python3.11

Installing:

To set up a Python environment we use Poetry:

curl -sSL https://install.python-poetry.org | python3 -

Once Poetry is installed and in your $PATH, you can execute the following:

poetry --version

If you see something like Poetry (version x.x.x), your install is ready to use!

Install the dependencies defined in specific-provisioner/pyproject.toml:



cd specific-provisioner

poetry env use /full/path/to/python3.11

poetry install

Note: All the following commands are to be run in the Poetry project directory with the virtualenv enabled. If you use pyenv to manage multiple Python runtimes, make sure Poetry is using the right version. You can tell pyenv to use the Python version available in the current shell. Check this Poetry docs page here.

Type check: is handled by mypy:

poetry run mypy src/

Tests: are handled by pytest:

poetry run pytest --cov=src/ tests/. --cov-report=xml

Artifacts & Docker image: the project leverages Poetry for packaging. Build package with:

poetry build

The Docker image can be built with:

docker build .

More details can be found here.

Note: the version for the project is automatically computed using information gathered from Git, using branch name and tags. Unless you are on a release branch 1.2.x or a tag v1.2.3 it will end up being 0.0.0. You can follow this branch/tag convention or update the version computation to match your preferred strategy.

CI/CD: the pipeline is based on GitLab CI as that's what we use internally. It's configured by the .gitlab-ci.yaml file in the root of the repository. You can use that as a starting point for your customizations.

Running

To run the server locally, use:

cd specific-provisioner
source $(poetry env info --path)/bin/activate # only needed if venv is not already enabled
uvicorn src.main:app --host 127.0.0.1 --port 8091

By default, the server binds to port 8091 on localhost. After it's up and running you can make provisioning requests to this address. You can also check the API documentation served here.

Deploying

This microservice is meant to be deployed to a Kubernetes cluster with the included Helm chart and the scripts that can be found in the helm subdirectory. You can find more details here.

License

This project is available under the Apache License, Version 2.0; see LICENSE for full details.

About us

Agile Lab creates value for its Clients in data-intensive environments through customizable solutions to establish performance driven processes, sustainable architectures, and automated platforms driven by data governance best practices.

Since 2014 we have implemented 100+ successful Elite Data Engineering initiatives and used that experience to create Witboost: a technology agnostic, modular platform, that empowers modern enterprises to discover, elevate and productize their data both in traditional environments and on fully compliant Data mesh architectures.

agile-lab-dev/DCAT-OWL-Adapter