HDC is tiny library for exchanging hierarchical data (arrays of structures) in shared memory between multiple programming languages, currently supporting C, C++, Python, Fortran and MATLAB.
The master repository lives in IPP CAS gitlab instance.
The bitbucket repo is just a read-only mirror.
The access to the master repo can be requested via email fridrich at ipp.cas.cz
The API documentation is available here.
To build HDC, you will need:
- c++14 compliant compiler (tested with intelstudio>=2018 and gcc>=5.0)
- gfortran >= 4.9
- Boost >= 1.48
- CMake >= 3.3
- Doxygen for documentation building
- Cython > 0.23 (there is some parsing error in 0.23)
Optionally it can use:
- Python > 3.4 (Python HDC bindings "pyHCD" and hdc-binder support)
- MATLAB > 2018a (MATLAB mex interface)
- Yahoo MDBM (recommended storage plugin working within shared memory)
- Redis + libhiredis-dev (storage plugin working within distributed memory)
- HDF5 devel libraries ((de)serialization, tested with 1.8 and 1.10)
- libs3 ((de)serialization plugin)
- flatbuffers ((de)serialization plugin)
Currently all commits are automatically tested against:
- Ubuntu 16.04 (xenial)
- Ubuntu 18.04 (bionic)
- Ubuntu 20.04 (focal)
- Fedora 31
- Centos 7
But HDC should work on any not-too-obsolette distro. If you face any problems, please, report it via email or project issue tracker.
Machine specific build instructions are available here.
There are several cmake options. The most important are:
-DCMAKE_INSTALL_PREFIX=/where/to/install
make install destination.-DBUILD_DOC=ON
Whether to build and install documentation.-DBUILD_EXAMPLES=ON
Whether to build and install examples.-DENABLE_HDF5=OFF
Switch off HDF5 serialization.- Python, if not detected correctly:
-DPYTHON_LIBRARY=/path/to/libpython.so
-DPYTHON_INCLUDE_DIR=/path/to/python/include
-DDEBUG=ON
Whether to print debugging messages.
Some of them can be edited using ccmake .
in build
directory.
The example of build follows:
- clone the git repository
git clone git@bitbucket.org:compass-tokamak/hdc.git
# cd into hdc
cd hdc
- build in a separate build directory
export HDC_PREFIX=$PWD/install
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HDC_PREFIX
make -j install
Important: numpy must be installed before pyhdc.
cd python
python setup.py build
python setup.py install
cd ..
Optionally, run Python tests:
cd python
python setup.py test
cd tests_binder
./run
cd ..
Especially for use with multiple compiler/library versions, HDC supports two ways of of building its binding interfaces. Currently this holds for FORTRAN, MATLAB and JAVA. Unless you need the same HDC with e.g. multiple MATLAB versions, you should use embedded build, which is set by adding -DENABLE_<LANG>=TRUE
to cmake arguments - e.g.:
cmake .. -DCMAKE_INSTALL_PREFIX=$HDC_PREFIX -DENABLE_MATLAB=TRUE -DENABLE_FORTRAN=TRUE - DENABLE_JAVA=TRUE
in this way, cmake adds specific subdirectories and tries to build everything at once.
On contrary, setting -DENABLE_<LANG>=FALSE
prevents <LANG>
binding from being built. After you install HDC and set up PKG_CONFIG_PATH
:
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HDC_PREFIX/lib/pkgconfig
The complete how to is in the next sections.
Just append -DENABLE_FORTRAN=TRUE
to your cmake command:
cmake .. -DCMAKE_INSTALL_PREFIX=$HDC_PREFIX -DENABLE_FORTRAN=TRUE
and you are done, make will do the rest
Disable fortran by adding -DENABLE_FORTRAN=FALSE
to your cmake - e.g.:
cmake .. -DCMAKE_INSTALL_PREFIX=$HDC_PREFIX -DENABLE_FORTRAN=FALSE
Now you can cd into binding directory and run:
cd fortran
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HDC_PREFIX
make -j
make install
Done. You should have it installed.
Make matlab and mex binaries findable (e.g. by modifying PATH environment variable, or by loading module), then you jus need to provide -DENABLE_MATLAB=TRUE
on cmake line, i.e.:
cmake .. -DCMAKE_INSTALL_PREFIX=$HDC_PREFIX -DENABLE_MATLAB=TRUE
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HDC_PREFIX/lib/pkgconfig
cd matlab
mkdir build
cd build
make -j
LD_LIBRARY_PATH=../../install/lib matlab -nojvm -r "run('test_matlab')"
You should see "All tests are OK..." message - in such case, the mex interface should work fine...
Installation is up to you, e.g.:
cp *.m *.mexa64 /desired/matlab/stuff/dir
jHDC build requires maven
and openjdk-8-jdk-headless
or another JDK (not tested). Please ensure you have these installed.
There are several cmake options for jHDC:
-DENABLE_JAVA
Enable JAVA support (OFF
by default).-DJAR_INSTALL_PREFIX
Where to install resulting jar(s) (Defaults to${CMAKE_INSTALL_PREFIX}/share/java
).-DINSTALL_JAVA_DEPENDENCIES
Whether to install also dependencies (OFF
by default). If enabled, all jars are put into$JAR_INSTALL_PREFIX
.-DJAR_WITH_DEPENDENCIES
Build jar with all dependencies bundled inside. The result can be quite large, but no dependencies are needed. (OFF
by default).-DJAVACPP_PLATFORM
Build bundled jar with only specified arch (linux-x86_64
by default). This reduces the jar size from ~460MB to ~80MB. Anybody loving huge jars can set empty string here. Detailed description here.
Clearly, not all combinations of these options make sense, but making some of them dependent does not make the sense either.
Usually you would want either:
-DENABLE_JAVA=ON -DINSTALL_JAVA_DEPENDENCIES=ON --DJAR_WITH_DEPENDENCIES=OFF
or:
-DENABLE_JAVA=ON -DJAR_WITH_DEPENDENCIES=ON -DJAVACPP_PLATFORM=linux-x86_64
depending on your preferences.
How to run jHDC example is described here.
CMAKE options remain the same as for embedded build
cd java
mkdir build
cd build
make -j
make install
Instrucions are similar as above, just setup your environment and CC,CXX and FORTRAN variables:
source /sw/intel/parallel_studio_xe_2018/psxevars.sh
CC=icc FORTRAN=ifort CXX=icpc cmake .. -DCMAKE_INSTALL_PREFIX=$PWD/../install
Everyone is welcome to contribute. Please use Git and merge resquest.
bumpversion is used for maintaining version consistently across the repository and files.
To create a new version, use
bumpversion PART
where PART
is either patch
, minor
or major
.
Git repository needs to be synchronized afterwards:
git push
git push --tags
- The hierarchical structure is organized as tree.
- Subtrees (Nodes) can be accessed by path similarily to file system paths like "aaa/bbb"
- Each node can be one of several types:
- Empty node - this is the initial state of node. Empty node does not store any data and does not have any children. By adding subnode, slice or data it's type is automaticaly changed to another type.
- Structure/list node - the node has at least one children indexed by string path. It can only store subtrees indexed by path/integer index.
- Array node - the node has at least one children indexed by integer. It can only store subtrees indexed by integer.
- Data node - it only can be terminal node, it stores some data, currently char* buffer.
If working directly with HDC tree (i.e.: methods like get()
, put()
, set()
)
The path string is internally converted to
hdc_path_t = std::list<hdc_index_t>
type where
hdc_index_t = boost::variant<size_t, std::string>
type represents single level of path. Therefore every tree node can be refferenced by:
- empty string representing identity, e.g.:
"//"
is ommited unless part of protocol specification like"json://"
- key (string) referencing children of hash map/dict
- index (non-negative integer) for referencing value in list/array Individual keys are separated by slash, indexes are surrounded by brakets.
For example the following string
"aaa/bbb//ccc[5]/ddd"
represents the following node within the HDC tree:
"aaa" -> "bbb" -> "ccc" -> 5 -> "ddd"
For loading from or saving to outside HDC tree (methods load()
and save()
), one has to also specify protocol and file path. Is such case there are two ekvivalent options:
-
protocol+file path and the path within the file data are concatenated using pipe character
|
:HDC n = HDC::load("protocol://path/to/file|path/within/the/file")
-
two arguments are provided:
HDC n = HDC::load("protocol://path/to/file", "path/within/the/file")
Internally the first option calls the second one, so usage of the second form spares some method calls.
The supported protocols are:
- json
- json_string
- json_verbose
- uda
- uda_new
- hdf5
- hdc_file
- hdc_string
- flatbuffers
- s3
The basic examples can be found in examples folder. The executables are built in CMAKE_BUILD_DIR/bin directory.
The Python examples can be run from any arbitraty folder, the only necessary thing is to set LD_LIBRARY_PATH properly:
LD_LIBRARY_PATH="../build/lib/" ipython hdc_fortran_module.py
See this file
- Read-only and/or copy-on-write potection of data.
- File systems and database access (key-value, object stores) via the HDC API.
- Support for metadata.
- A plugin system for, e.g., data systems validation or conversion, object-oriented features (methods for particular data types), ...
- Support for scientific data: dimensions, units, etc.
- More features: slicing, lazy evaluation, richer API, ...
- HDC holds data buffers in (shared) memory, hence passing HDC containers means
- no data copy,
- no serialization / deserialization,
- better performace.
- HDC is written in C++ with bindings to Fortran, C, Python, MATLAB and other languages in mind.
- HDC API can abstract out various back-end storage solutions: file systems, key-value stores, clouds, ...
Citing Conduit: provides an intuitive model for describing hierarchical scientific data in C++, C, Fortran, and Python and is used for data coupling between packages in-core, serialization, and I/O tasks.
- HDC supports (in private/shared memory) zero-copy data access.
- The goals are very close to ours, the way is slightly different.