nimhdf5

This repository contains thin high-level bindings for the HDF5 data format for the Nim programming language. It also provides a wrapper of the full HDF5 C library, importable using:

import nimhdf5/hdf5_wrapper

The raw wrapper dynamically links the libhdf5.so (the main library) and libhdf5_hl.so (a library containing high-level convenience functions) libraries at runtime. All public functions of the two libraries are callable by their corresponding C names and C arguments. That means the Nim datatypes need to be manually cast to the corresponding compatible types, if only the wrapper is used.

The high-level bindings, while covering most general HDF5 features, are still in a rough state, due to limited testing by actual users. Most features should work fine (at least using linux), but some known bugs are still there. See examples/h5_high_level_example.nim as an overview (soon to be cleaned up, split and put into a tutorial form) on the available features and their usage. Also take a look at the tests for simple examples of specific features. For a more indepth example of usage of this library, take a look at:

https://github.com/Vindaar/TimepixAnalysis/blob/master/InGridDatabase/src/ingridDatabase as an example using HDF5 as a simple database
https://github.com/Vindaar/TimepixAnalysis/blob/master/Analysis/ingrid/raw_data_manipulation.nim for usage of more advanced features like variable length data, writing hyperslabs, creating hard links, etc. in a bigger project.

The wrapper was built making heavy use of c2nim and the main goal was to have a usable interface for the HDF5 data format. More advanced features (e.g. single writer / multiple reader) were of lower priority. Via the wrapper all features should in principle work, but many have not been tested.

Compatibility

The wrapper is currently tested using Nim version 0.18.0 and the current devel branch (0.18.1).

The wrapper is built from HDF5 version 1.10.1.

Linking against the HDF5 1.8 library is reasonably supported as well, but requires to use an additional compiler flag for now:

-d:H5_LEGACY

With the recent release of version 1.10.3 / 1.10.4, support for this version currently is available under the

-d:H5_FUTURE

flag. Soon this will become the default version! The H5Oget/visit_...2 procedures are wrapped under a name without a 2 suffix. These however add a fields argument. Therefore an overload is available, which maps the fields to H5O_INFO_ALL.

Troubleshooting

If you compile a nim program using nimhdf5 without any of those flags and try to run it on a system with a HDF5 shared library of version 1.10.3 or newer, you will be greeted by:

could not import: H5Oget_info

In that case, add the -d:H5_FUTURE flag to the compilation command (or probably add it to your nim.cfg or config.nims of the project).

On the other hand if you try to run such a compiled binary on a system with a HDF5 library of version 1.8, you will probably see:

could not import: H5P_LST_FILE_CREATE_g

add the -d:H5_LEGACY flag.

In case neither of these work, please open an issue!

Version checks in the wrapper

Currently no checks are done, which compare the library this wrapper is built upon with the library linking against using some of the provided HDF5 macros (e.g. H5check, H5get_libversion etc.). The main reason is explained below. However, as far as the high-level functionality is concerned at the moment, the only differences arise in a few constant definitions, whose names slightly changed from 1.8 to 1.10. This is what the compiler flags sets accordingly.

The HDF5 headers contain macros for many variables, such as

#define H5F_ACC_RDONLY	(H5CHECK H5OPEN 0x0000u)

where

#define H5CHECK          H5check(),

and

#define H5OPEN        H5open(),

i.e. it makes use of C’s comma operator. However, c2nim currently has no support for it. Instead of porting them in some reasonable way, these macros were converted to simple replacements with the values, dropping the calls to H5check() and H5open().

The call to H5check() is currently not used at all. Compiling a Nim program with this wrapper (based on version 1.10.1) would normally fail to check against the linked library, if that version is different.

As H5open() is important, the calls are replaced by a single call to initialize the library at the beginning upon the first call of the library via src/nimhdf5/wrapper/H5niminitialize.nim.

As HDF5 is a very macro heavy library, other important macros may not have been correctly wrapped to Nim, e.g. determination of correct sizes of data types. This may cause some weird side-effects (to be fair, I haven’t noticed any!).

Additionally, Windows support is unknown at this time. The library name is correctly set for Windows, however an additional header file H5FDwindows.h might have to be wraped.

Installation

Installation can either be done via nimble:

nimble install nimhdf5

or manually by cloning this git repository:

git clone https://github.com/vindaar/nimhdf5

in a folder of your choice and call nimble install afterwards:

cd nimhdf5
nimble install

Or simply make use of nimble’s Github interfacing capabilities:

nimble install https://github.com/vindaar/nimhdf5

Files

The folder c_headers contains the modified HDF5 headers in the state they were in for a successful c2nim conversion. In some cases the C header file had to be modified, in others modification to the resulting .nim file was still necessary.

The folder examples contains the basic HDF5 C examples (see here: https://support.hdfgroup.org/HDF5/examples/intro.html#c) converted to Nim utilizing the wrapper.

h5_high_level_example.nim serves as a replacement for a tutorial for now (tutorial will be added soon!), showcasing (almost) all available features and their usage.

Known bugs and quirks

The high level bindings come with several quirks which are good to know.

when reading back a dataset with dimension > 1, the returned data is returned in a flat seq, instead of e.g. a nested seq[seq[<type>]] as one might expect. To get the data in the correct shape, use the reshape or (reshape2D, reshape3D) procs from util.nim. See the example file or the following tests: tutil.nim, treshape.nim for the usage. The exception is variable length data in case of a 1D dataset containing seqs of varying sizes. Here a nested seq of the correct elements is returned.
when grabbing a group or dataset from a H5FileObj via [](name: string), a conversion of the string to a distinct string type grp_str or dset_str is used to provide a uniform interface for both from a file object.
1D datasets do not have shape (N, ) as one would see in Python, but are represented by (N, 1) instead.
and many more

Implemented HDF5 features

groups
- creating (nested) groups
- iterating over groups (recursively)
datasets
- writing / reading static sized N-D arrays of any type
- writing / reading variable length data
- chunked storage
data types:
- any basic nim type, that is:
  - SomeNumber (all ints and floats)
  - string (not for datasets atm)
- compound datatypes of objects / tuples, where the fields have to be of the above mentioned basic types.
hyperslabs
- writing / reading hyperslabs using H5 notation
compression / filters
- zlib compression
- szip compression
- blosc compression (external) User needs to compile / install:
  - https://github.com/Blosc/c-blosc https://github.com/Vindaar/nblosc
  Note: Windows / OSX not yet supported, due to wrong name of libblosc.so in blosc.nim#L6. Change it appropriately.
- sort of soon: fletcher32, shuffle, nbits
attributes
- writing / reading on datasets, groups
- all types supported
  - basic types (int, float, …)
  - seqs of basic types
  - strings
  - reading variable length strings (different from static length strings in H5 attributes!)
hardlink datasets and groups within a file
iterators over:
- groups
- datasets
- attributes
Single Writer Multiple Reader (SWMR). See for more info below.

Single Writer Multiple Reader

This wrapper fully supports the Single Writer Multiple Reader feature of the HDF5 library, but it is still in an experimental state, as I’ve never really needed it.

It allows to access a single HDF5 file from multiple threads or processes, where one of these is a writer process and all others are readers. When using this feature the user does not have to worry about locks etc. between the different processes.

Usage

Open an HDF5 file in write mode and hand the swmr flag:

# writer.nim
import nimhdf5
var h5f = H5open("/tmp/test.h5", "rw", swmr = true)
# do writing stuff

and in all reader threads / processes, simply do the same, but do not hand a write:

# reader.nim
import nimhdf5
var h5f = H5open("/tmp/test.h5", "r", swmr = true)

This should be all that is required.

I’m not sure if the writer process should make sure to flush the file regularly or not. Feel free to tell me if you know. :)

Alternative writer

An alternative to the above for the writer process is to first open the file in write mode without swmr = true and then later put it into swmr mode via:

import nimhdf5
var h5f = H5open("/tmp/test.h5", "rw")
# do some regular stuff
# and then activate SWMR later
h5f.activateSWMR()

Threadsafe HDF5 library & file locks

This wrapper can also be used with a HDF5 library that was compiled with the --enable-threadsafe compilation flag.

Once the library has been compiled with it, in principle the user can try to open a single file in write mode from multiple processes or threads. For safe handling in these contexts, it may be up to the user to lock access to the file / writing to individual datasets via some locking mechanism. In principle the threadsafe option of the library adds its own mutex logic, so in theory it should work without them.

See these notes about the threadsafe library: https://support.hdfgroup.org/HDF5/doc/TechNotes/ThreadSafeLibrary.html

One issue a user might encounter is that the second opening of a file yields an error saying that the resource is temporarily unavailable. In HDF5 version starting from 1.10, file locking was added as a feature.

This behavior is controlled via an environment variable:

export HDF5_USE_FILE_LOCKING=FALSE

If set to false, file locking is disabled. With it multiple processes may open the same file.

When doing this, keep in mind that each thread / process will receive their own FileID. Some HDF5 functions may either give the user information based on the specific file ID and others based on the actual file. In the cases where one can choose, it is supported via the okLocal (ObjectKind enum) or fkLocal (FlushKind enum).

Relevant part of the documentation: https://docs.hdfgroup.org/hdf5/develop/_h5public_8h.html#title31

Blosc support

To use blosc as a filter you need to import:

import nimhdf5/blosc

Before v0.3.12 this was done automatically if the nblosc library is installed.

Vindaar/nimhdf5