/pyDive

Distributed Interactive Visualization and Exploration of large datasets

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

pyDive

Distributed Interactive Visualization and Exploration of large datasets.

What is pyDive?

Use pyDive to work with homogeneous, n-dimensional arrays that are too big to fit into your local machine's memory. pyDive provides containers whose elements are distributed across a cluster or stored in a large hdf5/adios-file if the cluster is still too small. All computation and data-access is then done in parallel by the cluster nodes in the background. If you feel like working with numpy arrays pyDive has reached the goal!

pyDive is developed and maintained by the Junior Group Computational Radiation Physics at the Institute for Radiation Physics at HZDR.

Features:

  • Since all cluster management is given to IPython.parallel you can take your existing profiles for pyDive. No further cluster configuration needed.
  • Save bandwidth by slicing an array in parallel on disk first before loading it into main memory!
  • GPU-cluster array available thanks to pycuda with additional support for non-contiguous memory.
  • As all of pyDive's distributed array types are auto-generated from local arrays like numpy, hdf5, pycuda, etc... you can easily make your own local array classes distributed too.

Dive in!

import pyDive
pyDive.init(profile='mpi')

h5field = pyDive.h5.open("myData.h5", "myDataset", distaxes=(0,1))
ones = pyDive.ones_like(h5field)

# Distribute file i/o and computation across the cluster
h5field[::10,:] = h5field[::10,:].load() + 5.0 * ones[::10,:]

Documentation

In our Online Documentation, pdf you can find detailed information on all interfaces as well as some Tutorials and a Quickstart.

Software License

pyDive is licensed under the GPLv3+ and LGPLv3+ (it is dual licensed). Licences can be found in GPL or LGPL, respectively.