Deep Learning on Flink aims to integrate Flink and deep learning frameworks (e.g. TensorFlow, PyTorch, etc.) to enable distributed deep learning training and inference on a Flink cluster.
It runs the deep learning tasks inside a Flink operator so that Flink can help establish a distributed environment, manage the resource, read/write the data with the rich connectors in Flink and handle the failures.
Currently, Deep Learning on Flink supports TensorFlow.
Deep Learning on Flink is tested and supported on the following 64-bit systems:
- Ubuntu 18.04
- macOS 10.15
- TensorFlow: 1.15.x & 2.4.x
- PyTorch: 1.11.x
- Flink: 1.14.x
Deep learning on Flink currently works with Tensorflow and PyTorch. You can see the following pages for the usage and examples.
Requirements
- python: 3.7
- cmake >= 3.6
- java 1.8
- maven >=3.3.0
Deep Learning on Flink requires Java and Python works together. Thus, we need to build for both Java and Python.
Please use the following command to initialize submodules before building from source.
git submodule update --init --recursive
mvn -DskipTests clean install
After finish, you can find the target distribution in the dl-on-flink-dist/target
folder.
You can run the following commands to install the Python packages from source
# Install dl-on-flink-framework first
pip install dl-on-flink-framework/python
# Note that you should only install one of the following as they require
# different versions of Tensorflow
# For tensorflow 1.15.x
pip install dl-on-flink-tensorflow/python
# For tensorflow 2.4.x
pip install dl-on-flink-tensorflow-2.x/python
We provide a script to build wheels for Python packages, you can run the following command.
bash tools/build_wheel.sh
After finish, you can find the wheels at tools/dist
. Then you can install the
python package with the wheels.
pip install tools/dist/<wheel>