Distributed KGE (C++)

IPU implementation of a sharded knowledge graph embedding (KGE) model, implemented in Poplar for execution using DRAM on an IPU-POD16.

Note that this is a low-level implementation for advanced IPU usage.

See also: PyTorch KGE demo notebook.

Usage

First-time setup

Ensure clang++ and ninja are installed.
Clone this repository with --recurse-submodules.
Install Poplar SDK and activate with source $POPLAR_SDK_DIR/enable.
Create and activate a Python virtual environment.
Install Python requirements pip install -r requirements-dev.txt
Check everything is working by running ./dev (see also ./dev --help).

For example:

sudo apt-get install clang++ ninja
git clone --recurse-submodules REPO
source $POPLAR_SDK_DIR/enable
virtualenv -p python3 .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
./dev --help
./dev

Training

Our standard training script is in scripts/run_training.py. To build the core C++ code, add it to the path and run training,

./dev train

This trains a TransE model with embedding size 256.

Note:

Build and develelopment automation is provided by the ./dev script, which generates a ninja build file (build/build.ninja).
You may wish to change C++ compiler, e.g. env CXX=g++ ./dev ...
The training script expects the OGB WikiKG90Mv2 dataset to be downloaded to $OGBWIKIKG_PATH; see the OGB WikiKG90Mv2 page for instructions.

About

The application is a self-contained research platform for KGE models, using Poplar/PopLibs directly for execution on IPU, PyTorch for data loading and numpy for batching and interchange. Since model checkpoints would be very large, all training, evaluation and prediction tasks are run in a single job via run_training.py.

The main components are:

scripts/{run_training.py, run_profile.py} - top-level entry points, note that we use Python configuration in place of a command line interface
Core model & training
- src/poplar_kge.cpp - core model and training step definition
- src/python/poplar_kge.py - Python glue code & experiment settings
- src/python/poplar_kge_dataset.py - data sampling & batching
Library-like components
- src/pag/ - Poplar AutoGrad (PAG), a self-contained mini-library for adding automatic differentiation to PopLibs programs
- src/fructose/ - Fructose, a self-contained mini-library for a friendly, noise-free interface to PAG
- src/poplar_extensions/ - custom device codelets, with a PopLibs-like interface, for efficient L1/L2 distance

See also doc/design.md for a more detailed description of the design of the application.

Poplar remote buffers

We rely on Poplar's access to streaming memory in this code (see IPU memory architecture), which enables sparse access to a much larger memory store. This is accessed via the remote memory buffers API.

One implementation detail of interest is that we stack all remote embedding state (consisting of entity features, embeddings and optimiser state) into a single remote buffer, which helps to minimise memory overhead due to padding.

References & license

The included code is released under a MIT license (see LICENSE).

Our dependencies are:

Component	Type	About	License
pybind11	submodule	C++/Python interop library (github)	BSD 3-Clause
Catch2	submodule	C++ unit testing framework (github)	Boost
OGB	`requirements.txt`	Open Graph Benchmark dataset and task definition (paper, website)	MIT
PyTorch	`requirements.txt`	Machine learning framework (website)	BSD 3-Clause
WandB	`requirements.txt`	Weights and Biases client library (website), for optional logging to wandb servers	MIT

We also use ninja (website) with clang++ from LLVM (website) to build C++ code and additional Python dependencies for development/testing (see requirements-dev.txt).

The OGB WikiKG90Mv2 dataset is licenced under CC-0.