/semantic-python-overview

(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)

Creative Commons Zero v1.0 UniversalCC0-1.0

join community

Semantic Python Overview

This repository aims to collect and curate a list of projects which are related both to python and semantic technologies (RDF, OWL, SPARQL, Reasoning, ...). It is inspired by collections like awesome lists. The list might be incomplete and biased, due to the limited knowledge of its authors. Improvements are very welcome. Feel free to file an issue or a pull request. Every section is alphabetically sorted.

Furthermore, this repository might serve as a cristallization point for a community interested in such projects – and how they might productively interact. See this discussion for more information.

Established Projects

  • Bioregistry - The Bioregistry

    • docs: https://bioregistry.readthedocs.io
    • website: https://bioregistry.io/
    • features:
      • Open source (and CC 0) repository of prefixes, their associated metadata, and mappings to external registries' prefixes
      • Standarization of prefixes and CURIEs
      • Interconversion between CURIEs and IRIs
      • Generation of context-specific prefix maps for usage in RDF, LinkML, SSSOM, OWL, etc.
  • brickschema – Brick Ontology Python package

    • Brick is an open-source effort to standardize semantic descriptions of the physical, logical and virtual assets in buildings and the relationships between them.
    • docs: https://brickschema.readthedocs.io/en/latest/
    • website: https://brickschema.org/
    • features:
      • basic inference with different reasoners
      • web based interaction (by means of Yasgui)
      • Translations from different formats (Haystack, VBIS)
  • Cooking with Python and KBpedia

  • CubicWeb a framework to build semantic web applications

    • website: https://www.cubicweb.org
    • docs: https://cubicweb.readthedocs.io/en/latest/
    • features:
      • An engine driven by the explicit data model of the application
      • RQL, an intuitive query language close to the business vocabulary
      • An architecture that separates data selection and visualisation
      • Data security by design
      • An efficient data storage
  • Eddy - graphical ontology editor

  • fastobo-py: Python bindings for fastobo (rust library to parse OBO 1.4)

    • features:
      • load, edit and serialize ontologies in the OBO 1.4 format
  • FunOwl – functional OWL syntax for Python

    • features:
      • provide a pythonic API that follows the OWL functional model for constructing OWL
  • Gastrodon - puts RDF data on your fingertips in Pandas; gateway to matplotlib, scikit-learn and other visualization tools.

    • features:
      • interpolate variables into SPARQL queries
      • access local RDFlib graphs and remote SPARQL protocol endpoints
      • convert SPARQL result set to pandas dataframes
      • understandable error messages
      • input/output graphs in Turtle form
      • conversion between RDF collections and Python collections
      • Sphinx domain to incorporate RDF data into documentation
  • gizmos – Utilities for ontology development

    • features:
      • modules for "export", "extract", "tree"-rendering
  • Jabberwocky – a toolkit for ontologies

    • features:
      • associated text mining using an ontology terms & synonyms
      • tf-idf for synonym curation then adding those synonyms into an ontology
  • kglab - Graph Data Science

    • docs: https://derwen.ai/docs/kgl/
    • tutorial: https://derwen.ai/docs/kgl/tutorial/
    • features:
      • an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries
    • perspective: there are several "camps" of graph technologies, with little discussion between them
      • focus on supporting "Hybrid AI" approaches that combine two or more graph technologies with other ML work
    • PyData stack – e.g., Pandas, scikit-learn, etc. – allows for graph work within data science workflows
    • scale-out tools – e.g., RAPIDS, Arrow/Parquet, Dask – provide for scaling graph computation (not necessarily databases)
    • graph algorithm libraries include NetworkX, iGraph, cuGraph – plus related visualization libraries in PyVis, Cairo, etc.
    • W3C libraries in Py also lacked full integration: RDFlib, pySHACL, OWL-RL, etc.
      • pslpython provides for probabilistic soft logic, working with uncertainty in probabilistic graphs
      • additional integration paths and examples show how to work with deep learning (PyG)
    • import paths from graph databases, such as Neo4j
      • import paths from note-taking tools, such as Roam Research
    • usage in MkRefs to add semantic features into MkDocs so that open source projects can federate bibliographies, shared glossaries, etc.
    • kglab team provides hands-on workshops at technology conferences for people to gain experience with these different graph approaches
  • KGX - Library for building and exchanging knowledge graphs

    • docs: https://kgx.readthedocs.io/
    • features:
      • Load graphs into an in-memory model to facilitate data integration, validation, and graph operations
      • Provides an easy way to bring data into Biolink Model, a a high-level data model for biomedical knowledge graphs
      • The core data structure is a Property Graph (PG), represented internally using a networkx.MultiDiGraph
      • Supports various input and output formats including,
        • RDF serializations
        • SPARQL endpoints
        • Neo4j endpoints
        • CSV/TSV and JSON
        • OWL
        • OBOGraph JSON format
        • SSSOM
  • LangChain's GraphSparqlQAChain – A LangChain module for making RDF and OWL accessible via natural language

  • LinkML – Linked Open Data Modeling Language

    • features:
      • A high level simple way of specifying data models, optionally enhanced with semantic annotations
      • A python framework for compiling these data models to json-ld, json-schema, shex, shacl, owl, sql-ddl
      • A python framework for data conversion and validation, as well as generated Python dataclasses
  • Macleod – Ontology development environment for Common Logic (CL)

    • features:
      • Translating a CLIF file to formats supported by FOL reasoners
      • Extracting an OWL approximation of a CLIF ontology
      • Verifying (non-trivial) logical consistency of a CLIF ontology
      • Proving theorems/lemmas, such as properties of concepts and relations or competency questions
      • GUI (alpha state)
  • Morph-KGC – System to create RDF and RDF-star knowledge graphs from heterogeneous sources with R2RML, RML and RML-star

    • docs: https://morph-kgc.readthedocs.io
    • features:
      • support for relational databases, tabular files (e.g. CSV, Excel, Parquet) and hierarchical files (XML and JSON)
      • generates RDF and RDF-star knowledge graphs by running through the command line or as a library
      • integrates with RDFlib and Oxigraph to load the generated RDF directly to those libraries
  • nxontology – NetworkX-based library for representing ontologies

    • features:
      • load ontologies into a networkx.DiGraph or MultiDiGraph from .obo, .json, or .owl formats (powered by pronto / fastobo)
      • compute information content scores for nodes and semantic similarity scores for node pairs
  • obonet – read OBO-formatted ontologies into NetworkX

    • features:
      • Load an .obo file into a networkx.MultiDiGraph
      • Users should try nxontology first, as a more general purpose successor to this project
  • OnToology – System for collaborative ontology development process

  • OntoPilot – software for ontology development and deployment

    • docs: https://github.com/stuckyb/ontopilot/wiki
    • features:
      • support end users in ontology development, documentation and maintainance
      • convert spreadsheet data (one entity per row) to owl files
      • call a reasoner before triple-store insertion
  • ontospy – Python library and command-line interface for inspecting and visualizing RDF models

    • docs: http://lambdamusic.github.io/Ontospy/
    • features:
      • extract and print out any ontology-related information
      • convert different OWL syntax variants
      • generate html documentation for an ontology
  • ontor – Python library for manipulating and vizualizing OWL ontologies in Python

    • features:
      • tool set based on owlready2 and networkx
  • owlready2 – ontology oriented programming in Python

  • Oxrdflib – Oxrdflib provides rdflib stores using pyoxigraph (rust-based)

    • could be used as drop-in replacements of the rdflib default ones
  • pronto: library to parse, browse, create, and export ontologies

  • pyfactxx – Python bindings for FaCT++ OWL 2 C++ reasoner

    • features:
      • well-optimized reasoner for SROIQ(D) description logic, with additional improvements
      • rdflib integration
      • easy cross-platform installation
  • PyFuseki – Library that interact with Jena Fuseki (SPARQL server):

  • PyKEEN (Python KnowlEdge EmbeddiNgs) – Python package to train and evaluate knowledge graph embedding models

    • features:
      • 44 Models
      • 37 Datasets
      • 5 Inductive Datasets
      • support for multi-modal information
  • PyLD - A JSON-LD processor written in Python

    • conforms:
      • JSON-LD 1.1, W3C Candidate Recommendation, 2019-12-12 or newer
      • JSON-LD 1.1 Processing Algorithms and API, W3C Candidate Recommendation, 2019-12-12 or newer
      • JSON-LD 1.1 Framing, W3C Candidate Recommendation, 2019-12-12 or newer
  • pyLoDStorage – python library to interchange data between SPARQL-, JSON and SQL-endpoints

  • PyOBO

    • docs: https://pyobo.readthedocs.io
    • features:
      • Provides unified, high-level access to names, descriptions, synonyms, xrefs, hierarchies, properties, relationships, etc. in ontologies from many sources listed in the Bioregistry
      • Converts databases into OWL and OBO ontologies
      • Wrapper around ROBOT for using Java tooling to convert between OBO and OWL
      • Internal DSL for generating OBO ontology
  • Pyoxigraph – Python graph database library implementing the SPARQL standard.

  • PyRes

  • pystardog

  • Quit Store – workspace for distributed collaborative Linked Data knowledge engineering ("Quads in Git")

  • RaiseWikibase – A tool for speeding up multilingual knowledge graph construction with Wikibase

  • Reasonable – An OWL 2 RL reasoner with reasonable performance

    • written in Rust with Python-Bindings (via pyo3)
  • ROBOT – Java-tool for automating ontology workflow with several reasoners (ELK, Hermite, ...) and Python interface

  • rdflib – Python package for working with RDF

    • docs: https://rdflib.readthedocs.io/
    • graphical package overview: https://rdflib.dev/
    • features:
      • parsers and serializers for RDF/XML, NTriples, Turtle, JSON-LD and more
      • a graph interface which can be backed by any one of a number of store implementations
      • store implementations for in-memory storage and persistent storage
      • a SPARQL 1.1 implementation – supporting SPARQL 1.1 Queries and Update statements
  • rdflib-endpoint – Python package for easily deploying SPARQL endpoints for RDFLib Graphs

    • features:
      • exposing machine learning models or any other logic implemented in Python through a SPARQL endpoint, using custom functions
      • serving local RDF files using the command line interface
  • serd – Python serd module, providing bindings for Serd, a lightweight C library for working with RDF data

  • sparqlfun

    • LinkML based SPARQL template library and execution engine
      • modularized core library of SPARQL templates
      • Fully FAIR description of templates
      • Rich expressive language for moedeling templates
        • uses LinkML as base language
      • optional python bindings / object model using LinkML
      • supports both SELECT and CONSTRUCT
      • optional export to TSV, JSON, YAML, RDF
      • extensive endpoint metadata
  • SPARQL kernel for Jupyter

    • features:
      • sending queries to an SPARQL endpoint
      • fetching and presenting the results in a notebook
  • SPARQLing Unicorn QGIS Plugin – QGIS plugin which adds a GeoJSON layer from SPARQL enpoint queries

  • SPARQLWrapper – A wrapper for a remote SPARQL endpoint

  • WikidataIntegrator – Library for reading and writing to Wikidata/Wikibase

    • features:
      • high integration with the Wikidata SPARQL endpoint

Probably Stalled or Outdated Projects

  • Athene DL reasoner in pure python
    • "[C]urrent version is a beta and only supports ALC. But it can easily be extended by adding tableau rules."
    • Last update: 2017
  • cwm
    • Self description: "[cwm is a] forward chaining semantic reasoner that can be used for querying, checking, transforming and filtering information".
    • Created in 2000 by Tim Berners-Lee and Dan Connolly, see w3.org
  • air-reasoner
    • Self description: "Reasoner for the AIR policy language, based on cwm"
    • based on cwm
    • Last update: 2013
  • FuXi
  • pysumo

Further Projects / Links