/vol-daos

HDF5 VOL connector for DAOS

Primary LanguageCBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

HDF5 DAOS VOL connector

Latest version

Table of Contents

  1. Description
  2. Installation
  3. Testing and Usage
  4. More information

1. Description

The HDF5 DAOS VOL connector is a Virtual Object Layer (VOL) connector for HDF5 that allows for direct interfacing with the Distributed Asynchronous Object Storage (DAOS) system, bypassing both MPI I/O and POSIX for efficient and scalable I/O, removing the limitations of the native HDF5 file format and enabling new features such as independent creation of objects in parallel, key-value store objects, data recovery, asynchronous I/O, etc.

Applications already using HDF5 can, using this VOL connector and a DAOS-enabled system, benefit of some of these features with minimal code changes. The connector is built as a plugin library that is external to HDF5, meaning that it must be dynamically-loaded in the same fashion as HDF5 filter plugins.

2. Installation

Below is set a of instructions that is compiled to provide a minimal installation of the DAOS VOL connector on a DAOS-enabled system.

Prerequisites

To build the DAOS VOL connector, the following libraries are required:

  • libhdf5 - The HDF5 library. Minimum version required is 1.14.0, compiled with support for both parallel I/O and map objects (i.e., -DHDF5_ENABLE_MAP_API=ON CMake option).

  • libdaos - The DAOS library. Minimum version required is 1.3.106-tb.

  • libuuid - UUID support.

Compiled libraries must either exist in the system's library paths or must be pointed to during the DAOS VOL connector build process.

Build instructions

The HDF5 DAOS VOL connector is built using CMake. CMake version 2.8.12.2 or greater is required for building the connector itself, but version 3.1 or greater is required to build the connector's tests.

If you install the full sources, put the tarball in a directory where you have permissions (e.g., your home directory) and unpack it:

gzip -cd hdf5_vol_daos-X.tar.gz | tar xvf -

or

bzip2 -dc hdf5_vol_daos-X.tar.bz2 | tar xvf -

Replace 'X' with the version number of the package.

After obtaining the connector's source code, you can create a build directory within the source tree and run the ccmake or cmake command from it:

cd hdf5_vol_daos-X
mkdir build
cd build
ccmake ..

If using ccmake, type 'c' multiple times and choose suitable options or if using cmake, pass these options with -D. Some of these options may be needed if, for example, the required components mentioned previously are not located in default paths.

Setting include directory and library paths may require you to toggle to the advanced mode by typing 't'. Once you are done and do not see any errors, type 'g' to generate makefiles. Once you exit the CMake configuration screen and are ready to build the targets, do:

make

Verbose build output can be generated by appending VERBOSE=1 to the make command.

Assuming that the CMAKE_INSTALL_PREFIX has been set and that you have write permissions to the destination directory, you can install the connector by simply doing:

 make install

CMake options

  • CMAKE_INSTALL_PREFIX - This option controls the install directory that the resulting output files are written to. The default value is /usr/local.
  • CMAKE_BUILD_TYPE - This option controls the type of build used for the VOL connector. Valid values are Release, Debug, RelWithDebInfo, MinSizeRel, Ubsan, Asan; the default build type is RelWithDebInfo.

Connector options

  • BUILD_TESTING - This option is used to enable/disable building of the DAOS VOL connector's tests. The default value is ON.
  • BUILD_EXAMPLES - This option is used to enable/disable building of the DAOS VOL connector's HDF5 examples. The default value is OFF.
  • HDF5_C_COMPILER_EXECUTABLE - This option controls the HDF5 compiler wrapper script used by the VOL connector build process. It should be set to the full path to the HDF5 compiler wrapper (usually bin/h5cc), including the name of the wrapper script. The following two options may also need to be set.
  • HDF5_C_LIBRARY_hdf5 - This option controls the HDF5 library used by the VOL connector build process. It should be set to the full path to the HDF5 library, including the library's name (e.g., /path/libhdf5.so). Used in conjunction with the HDF5_C_INCLUDE_DIR option.
  • HDF5_C_INCLUDE_DIR - This option controls the HDF5 include directory used by the VOL connector build process. Used in conjunction with the HDF5_C_LIBRARY_hdf5 variable.
  • DAOS_LIBRARY - This option controls the DAOS library used by the VOL connector build process. It should be set to the full path to the DAOS library, including the library's name (e.g., /path/libdaos.so). Used in conjunction with the DAOS_UNS_LIBRARY and DAOS_INCLUDE_DIR options.
  • DAOS_UNS_LIBRARY - This option controls the DAOS unified namespace library used by the VOL connector build process. It should be set to the full path to the DAOS libduns library, including the library's name (e.g., /path/libduns.so). Used in conjunction with the DAOS_LIBRARY and DAOS_INCLUDE_DIR options.
  • DAOS_INCLUDE_DIR - This option controls the DAOS include directory used by the VOL connector build process. Used in conjunction with the DAOS_LIBRARY and DAOS_UNS_LIBRARY options.
  • MPI_C_COMPILER - This option controls the MPI C compiler used by the VOL connector build process. It should be set to the full path to the MPI C compiler, including the name of the executable.

3. Testing and Usage

In the connector, each chunk is stored in a different DAOS dkey, and data in a single dkey is stored in a single DAOS storage target. Therefore, splitting the data into different chunks stripes the data across different dkeys and different storage targets. This improves I/O performance by allowing DAOS to read from or write to multiple storage targets at once.

The bandwidth improvement from using different storage targets is so vital that, if h5pset_chunk() is not used, i.e., contiguous datasets, the connector will automatically set a chunk size. The connector, by default, tries to size these chunks to approximately 1 MiB. The environment variable HDF5_DAOS_CHUNK_TARGET_SIZE (in bytes) sets the chunk target size. Setting this variable to 0 disables automatic chunking, and contiguous datasets will stay contiguous (and will therefore only be stored on a single storage target). Better performance may be obtained by choosing a larger chunk target size, such as 4-8 MiB.

For further information on how to use the DAOS VOL connector with an HDF5 application, as well as how to test that the VOL connector is functioning properly, please refer to the DAOS VOL User's Guide under docs/users_guide.pdf.

4. More Information

DAOS VOL

Design documentation for the DAOS VOL can be found under docs/design_doc.pdf.

Journal paper:

  • J. Soumagne, J. Henderson, M. Chaarawi, N. Fortner, S. Breitenfeld, S. Lu, D. Robinson, E. Pourmal, J. Lombardi, "Accelerating HDF5 I/O for Exascale Using DAOS," in IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, pp. 903-914, April 2022. | paper |

DAOS

DAOS installation and usage instructions can be found on the DAOS website: https://docs.daos.io/