openPMD/openPMD-api

Roadmap: Wishlist of new features

franzpoeschel opened this issue · 0 comments

A wishlist of features after the upcoming releases:

Near-Term

  • Implement openPMD 2.0
  • New ADIOS2 schema with support for modifiable attributes
    #1310
  • non-MPI parallel reads in DASK with ADIOS ornladios/ADIOS2#3651
  • New ADIOS2 JoinedArray support for particles #1374 #1382
    ornladios/ADIOS2#3466
  • Deprecation and later removal of RecordComponent::SCALAR
    #1154
  • Full support for steps in ADIOS2, full support for variable-based iteration encoding. This implies:
    • Default enabled steps
    • Support random-accessing steps
    • Skip duplicate iterations at read time
    • Support reopening closed iterations #1606
  • Julia bindings #1025
  • better object model for default attributes #1439
  • distributed initialization of a read-access dataset in ADIOS2 within non-MPI contexts (e.g. DASK, ref ornladios/ADIOS2#3651)
  • openpmd-pipe: modularize into visitor pattern
  • openpmd-pipe: reuse in new CLI tools such as openpmd-coarsen (fields, particles #1390)

Mid-Term

  • Support for joined arrays in backends other than ADIOS2
  • async for iteration in s.read_iterations() for Python
  • More flexible reads in C++17 #1372 and Python
  • MPI-wise logging of IO actions
  • Performance optimization: Long-running simulations (many iterations, reading and writing)
  • Specify default attributes not upon construction, but upon closing, clean up the logic for specifying defaults, constructors and destructors of our object model
    • Context: crashing simulations
    • If in read, a standard attribute is missing, then warn and add a reasonable default (e.g., a 3-value axisLabel for a 3D-mesh)
  • Support for PIConGPU-style dataset-specific JSON/TOML configuration
    https://picongpu.readthedocs.io/en/0.6.0/usage/plugins/openPMD.html#cfg-file, also for iteration-specific configuration, e.g. for InitialBufferSize per file
  • Maybe Flag for writing attributes only from rank 0
  • Python docstrings #1328
  • Maybe SoA <-> AoS flexibility (affects the standard)
    • Probably requires struct-type fields
  • Project structure: Separate MPI headers from serial headers
    With this change: Provide openPMD-api via Linux package managers
  • Compression and plugins in HDF5
  • HDF5 hardlinks + maybe as a fallback softlinks openPMD/openPMD-standard#283

Long-Term / Ideas

  • Synchronous mode: Avoid UB for store and load chunks

    • In both C++ and Python, we would like to avoid that the user can interact with allocated but non-flushed (UB) data.
    • For this, we could rename storeChunk / loadChunk to ...Async(), which returns a std::future (C++) or asyncio.Future (Python).
      • We need to keep track which of these in-flight objects we created and will set them to valid on series.flush().
      • If a future is awaited before flush was called, we throw a runtime exception, which allows to recover in interactive use. Futures also allow us to check if the futures are valid w/o having to catch exceptions.
    • The existing APIs would be sync.
  • Maybe Use ADIOS2 group feature in reading (-> faster parsing)

  • Maybe Chunk distribution algorithms

  • Maybe Async I/O (especially Python)