/typedb-ml

TypeDB-ML is the Machine Learning integrations library for TypeDB

Primary LanguagePythonApache License 2.0Apache-2.0

This repository is outdated and not supported. We will be closing this repository by end of 2023.


GitHub release Discord Discussion Forum Stack Overflow Stack Overflow

TypeDB-ML

Previously known as KGLIB.

TypeDB-ML provides tools to enable graph algorithms and machine learning with TypeDB.

There are integrations for NetworkX and for PyTorch Geometric (PyG).

NetworkX integration allows you to use a large library of algorithms over graph data exported from TypeDB.

PyTorch Geometric (PyG) integration gives you a toolbox to build Graph Neural Networks (GNNs) for your TypeDB data, with an example included for link prediction (or: binary relation prediction, in TypeDB terms). The structure of the GNNs are totally customisable, with network components for popular topics such as graph attention and graph transformers built-in.

Features

NetworkX

  • Declare the graph structure of your queries, with optional sampling functions.
  • Query a TypeDB instance and combine many results across many queries into a single graph (build_graph_from_queries).

PyTorch Geometric

  • A DataSet object to lazily load graphs from a TypeDB instance. Each graph is converted to a PyG Data object.
  • It's most natural to work with PyG HeteroData objects since all data in TypeDB has a type. Conversion from Data to HeteroDatais available in PyG, but it loses node ordering information. To remedy this, TypeDB-ML provides store_concepts_by_type to store concepts consistent with a HeteroData object. This enables concepts to be properly re-associated with predictions after learning is finished.
  • A FeatureEncoder to orchestrate encoders to generate features for graphs.
  • Encoders for Continuous and Categorical values to apply encodings/embedding spaces to the types and attribute values present in TypeDB data.
  • A full example for link prediction

Other

  • Example usage of Tensorboard for PyG HeteroData

Resources

You may find the following resources useful, particularly to understand why TypeDB-ML started:

Quickstart

Install

  • Python >= 3.7.x

  • Grab the requirements.txt file from here and install the requirements with pip install -r requirements.txt. This is due to some intricacies installing PyG's dependencies, see here for details.

  • Installed TypeDB-ML: pip install typedb-ml.

  • TypeDB 2.11.1 running in the background.

  • typedb-client-python 2.11.x (PyPi, GitHub release). This should be installed automatically when you pip install typedb-ml.

Run the Example

Take a look at the PyTorch Geometric heterogeneous link prediction example to see how to use TypeDB-ML to build a GNN on TypeDB data.

Development

To follow the development conversation, please join the Vaticle Discord, and join the #typedb-ml channel. Alternatively, start a new topic on the Vaticle Discussion Forum.

TypeDB-ML requires that you have migrated your data into a TypeDB or TypeDB Cluster instance. There is an official examples repo for how to go about this, and information available on migration in the docs. Alternatively, there are fantastic community-led projects growing in the TypeDB OSI to facilitate fast and easy data loading, for example TypeDB Loader.

Building from Source

It's expected that you will use Pip to install, but should you need to make your own changes to the library, and import it into your project, you can build from source as follows:

Clone TypeDB-ML:

git clone git@github.com:vaticle/typedb-ml.git

Go into the project directory:

cd typedb-ml

Build all targets:

bazel build //...

Run all tests. Requires Python 3.7+ on your PATH. Test dependencies are for Linux since that is the CI environment:

bazel test //typedb_ml/... --test_output=streamed --spawn_strategy=standalone --action_env=PATH

Build the pip distribution. Outputs to bazel-bin:

bazel build //:assemble-pip