/DALI

A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications

Primary LanguageC++OtherNOASSERTION

License Documentation

NVIDIA DALI

Deep learning applications require complex, multi-stage pre-processing data pipelines. Such data pipelines involve compute-intensive operations that are carried out on the CPU. For example, tasks such as: load data from disk, decode, crop, random resize, color and spatial augmentations and format conversions, are mainly carried out on the CPUs, limiting the performance and scalability of training and inference.

In addition, the deep learning frameworks have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows, and code maintainability.

NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks, and an execution engine, to accelerate the pre-processing of the input data for deep learning applications. DALI provides both the performance and the flexibility for accelerating different data pipelines as a single library. This single library can then be easily integrated into different deep learning training and inference applications.

Highlights

Highlights of DALI are:

  • Full data pipeline--accelerated from reading the disk to getting ready for training and inference.
  • Flexibility through configurable graphs and custom operators.
  • Support for image classification and segmentation workloads.
  • Ease of integration through direct framework plugins and open source bindings.
  • Portable training workflows with multiple input formats--JPEG, PNG (fallback to CPU), TIFF (fallback to CPU), BMP (fallback to CPU), raw formats, LMDB, RecordIO, TFRecord.
  • Extensible for user-specific needs through open source license.

DALI and NGC

DALI is preinstalled in the NVIDIA GPU Cloud TensorFlow, PyTorch, and MXNet containers in versions 18.07 and later.


Installing prebuilt DALI packages

Prerequisites

  1. Linux x64.
  2. NVIDIA Driver supporting CUDA 9.0 or later (i.e., 384.xx or later driver releases).
  3. One or more of the following deep learning frameworks:

Installation

Execute the below command CUDA 9.0 based build:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali

Starting DALI 0.8.0 for CUDA 10.0 based build use:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali

Note

Since 0.11.0 nvidia-dali package doesn't contain prebuilt versions of the DALI TensorFlow plugin, DALI TensorFlow plugin needs to be installed explicitly for the currently present version of TensorFlow:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali-tf-plugin

Starting DALI 0.8.0 for CUDA 10.0 based build execute:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali-tf-plugin

Note

Due to a known issue with installing dependent packages), DALI needs to be installed before installing nvidia-dali-tf-plugin (in a separate pip install call). The package tensorflow-gpu must be installed before attempting to install nvidia-dali-tf-plugin.

Note

The package nvidia-dali-tf-plugin has a strict requirement with nvidia-dali as its exact same version. Thus, installing nvidia-dali-tf-plugin at its latest version will replace any older nvidia-dali versions already installed with the latest. To work with older versions of DALI, provide the version explicitly to the pip install command.

OLDER_VERSION=0.6.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-tf-plugin==$OLDER_VERSION
Nightly and weekly release channels

Note

While binaries available to download from nightly and weekly builds include most recent changes available in the GitHub some functionalities may not work or provide inferior performance comparing to the official releases. Those builds are meant for the early adopters seeking for the most recent version available and being ready to boldly go where no man has gone before.

Note

It is recommended to uninstall regular DALI and TensorFlow plugin before installing nvidia-dali-nightly or nvidia-dali-weekly as they are installed in the same path

Nightly builds

To access most recent nightly builds please use flowing release channel:

  • for CUDA9
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/nightly/cuda/9.0 nvidia-dali-nightly
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/nightly/cuda/9.0 nvidia-dali-tf-plugin-nightly
  • for CUDA10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/nightly/cuda/10.0 nvidia-dali-nightly
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/nightly/cuda/10.0 nvidia-dali-tf-plugin-nightly
Weekly builds

Also, there is a weekly release channel with more thorough testing (only CUDA10 builds are provided there):

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/weekly/cuda/10.0 nvidia-dali-weekly
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/weekly/cuda/10.0 nvidia-dali-tf-plugin-weekly

Compiling DALI from source (using Docker builder) - recommended

Following these steps, it is possible to recreate Python wheels in a similar fashion as we provide as an official prebuild binary.

Prerequisites

Linux x64  
Docker Follow installation guide and manual at the link (version 17.05 or later is required).

Building Python wheel and (optionally) Docker image

Change directory (cd) into Docker directory and run ./build.sh. If needed, set the following environment variables:

  • PYVER - Python version. Default is 2.7.
  • CUDA_VERSION - CUDA toolkit version (9 for 9.0 or 10 for 10.0). Default is 10.
  • NVIDIA_BUILD_ID - Custom ID of the build. Default is 1234.
  • CREATE_WHL - Create a standalone wheel. Default is YES.
  • CREATE_RUNNER - Create Docker image with cuDNN, CUDA and DALI installed inside. It will create the Docker_run_cuda image, which needs to be run using nvidia-docker and DALI wheel in the wheelhouse directory under$
  • DALI_BUILD_FLAVOR - adds a suffix to DALI package name and put a note about it in the whl package description, i.e. nightly will result in the nvidia-dali-nightly
  • CMAKE_BUILD_TYPE - build type, available options: Debug, DevDebug, Release, RelWithDebInfo. Default is Release.
  • BUILD_INHOST - ask docker to mount source code instead of copying it. Thank to that consecutive builds are resuing existing object files and are faster for the development. Uses $DALI_BUILD_DIR as a directory for build objects. Default is YES.
  • REBUILD_BUILDERS - if builder docker images need to be rebuild or can be reused from the previous build. Default is NO.
  • REBUILD_MANYLINUX - if manylinux base image need to be rebuild. Default is NO.
  • DALI_BUILD_DIR - where DALI build should happen. It matters only bit the in-tree build where user may provide different path for every python/CUDA version. Default is build-docker-${CMAKE_BUILD_TYPE}-${PYV}-${CUDA_VERSION}.

It is worth to mention that build.sh should accept the same set of environment variables as the project CMake.

The recommended command line is:

PYVER=X.Y CUDA_VERSION=Z ./build.sh

For example:

PYVER=3.6 CUDA_VERSION=10 ./build.sh

Will build CUDA 10 based DALI for Python 3.6 and place relevant Python wheel inside DALI_root/wheelhouse


Compiling DALI from source (bare metal)

Prerequisites

Required Component Notes
Linux x64  
GCC 4.9.2 or later  
Boost 1.66 or later Modules: preprocessor.
NVIDIA CUDA 9.0 CUDA 8.0 compatibility is provided unofficially.
nvJPEG library This can be unofficially disabled. See below.
protobuf
Version 2 or later
(Version 3 or later is required for TensorFlow TFRecord file format support).
CMake 3.11 or later  
libjpeg-turbo 1.5.x or later This can be unofficially disabled. See below.
FFmpeg 3.4.2 or later We recommend using version 3.4.2 compiled following the instructions below.
OpenCV 3 or later Supported version: 3.4
(Optional) liblmdb 0.9.x or later  
One or more of the following Deep Learning frameworks:

Note

TensorFlow installation is required to build the TensorFlow plugin for DALI.

Note

Items marked "unofficial" are community contributions that are believed to work but not officially tested or maintained by NVIDIA.

Note

This software uses the FFmpeg licensed code under the LGPLv2.1. Its source can be downloaded from here.

FFmpeg was compiled using the following command line:

./configure \
 --prefix=/usr/local \
 --disable-static \
 --disable-all \
 --disable-autodetect \
 --disable-iconv \
 --enable-shared \
 --enable-avformat \
 --enable-avcodec \
 --enable-avfilter \
 --enable-protocol=file \
 --enable-demuxer=mov,matroska \
 --enable-bsf=h264_mp4toannexb,hevc_mp4toannexb && \
 make

Get the DALI source

git clone --recursive https://github.com/NVIDIA/dali
cd dali

Make the build directory

mkdir build
cd build

Compile DALI

Building DALI without LMDB support:

cmake ..
make -j"$(nproc)"

Building DALI with LMDB support:

cmake -DBUILD_LMDB=ON ..
make -j"$(nproc)"

Building DALI using Clang (experimental):

Note

This build is experimental. It is neither maintained nor tested. It is not guaranteed to work. We recommend using GCC for production builds.

cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang  ..
make -j"$(nproc)"

Optional CMake build parameters:

  • BUILD_PYTHON - build Python bindings (default: ON)
  • BUILD_TEST - include building test suite (default: ON)
  • BUILD_BENCHMARK - include building benchmarks (default: ON)
  • BUILD_LMDB - build with support for LMDB (default: OFF)
  • BUILD_NVTX - build with NVTX profiling enabled (default: OFF)
  • BUILD_TENSORFLOW - build TensorFlow plugin (default: OFF)
  • BUILD_NVJPEG - build with nvJPEG support (default: ON)
  • BUILD_NVOF - build with NVIDIA OPTICAL FLOW SDK support (default: ON)
  • BUILD_NVDEC - build with NVIDIA NVDEC support (default: ON)
  • BUILD_NVML - build with NVIDIA Management Library (NVML) support (default: ON)
  • WERROR - treat all build warnings as errors (default: OFF)
  • DALI_BUILD_FLAVOR - Allow to specify custom name sufix (i.e. 'nightly') for nvidia-dali whl package
  • (Unofficial) BUILD_JPEG_TURBO - build with libjpeg-turbo (default: ON)

Note

DALI release packages are built with the options listed above set to ON and NVTX turned OFF. Testing is done with the same configuration. We ensure that DALI compiles with all of those options turned OFF, but there may exist cross-dependencies between some of those features.

Following CMake parameters could be helpful in setting the right paths:

  • FFMPEG_ROOT_DIR - path to installed FFmpeg
  • NVJPEG_ROOT_DIR - where nvJPEG can be found (from CUDA 10.0 it is shipped with the CUDA toolkit so this option is not needed there)
  • libjpeg-turbo options can be obtained from libjpeg CMake docs page
  • protobuf options can be obtained from protobuf CMake docs page

Install Python bindings

pip install dali/python

Cross-compiling DALI C++ API for aarch64 Linux (Docker)

Note

Support for aarch64 Linux platform is experimental. Some of the features are available only for x86-64 target and they are turned off in this build. There is no support for DALI Python library on aarch64 yet. Some Operators may not work as intended due to x86-64 specific implementations.

Build the aarch64 Linux Build Container

docker build -t dali_builder:aarch64-linux -f Dockerfile.build.aarch64-linux .

Compile

From the root of the DALI source tree

docker run -v $(pwd):/dali dali_builder:aarch64-linux

The relevant artifacts will be in build/install and build/dali/python/nvidia/dali

Getting started

The docs/examples directory contains a few examples (in the form of Jupyter notebooks) highlighting different features of DALI and how to use DALI to interface with deep learning frameworks.

Also note:

  • Documentation for the latest stable release is available here, and
  • Nightly version of the documentation that stays in sync with the master branch is available here.

Additional resources

  • GPU Technology Conference 2018; Fast data pipeline for deep learning training, T. Gale, S. Layton and P. Trędak: slides, recording.
  • GPU Technology Conference 2019; Fast AI data pre-preprocessing with DALI; Janusz Lisiecki, Michał Zientkiewicz: slides, recording.
  • GPU Technology Conference 2019; Integration of DALI with TensorRT on Xavier; Josh Park and Anurag Dixit: slides, recording.
  • Developer page.
  • Blog post.

Contributing to DALI

We welcome contributions to DALI. To contribute to DALI and make pull requests, follow the guidelines outlined in the Contributing document. If you are looking for a task good for the start please check one from external contribution welcome label.

Reporting problems, asking questions

We appreciate feedback, questions or bug reports. When you need help with the code, follow the process outlined in the Stack Overflow (https://stackoverflow.com/help/mcve) document. Ensure that the posted examples are:

  • minimal: Use as little code as possible that still produces the same problem.
  • complete: Provide all parts needed to reproduce the problem. Check if you can strip external dependency and still show the problem. The less time we spend on reproducing the problems, the more time we can dedicate to the fixes.
  • verifiable: Test the code you are about to provide, to make sure that it reproduces the problem. Remove all other problems that are not related to your request.

Contributors

DALI was built with major contributions from Trevor Gale, Przemek Tredak, Simon Layton, Andrei Ivanov, Serge Panev.