/pod5-rs

Rust library to support interacting with Oxford Nanopore POD5 files

Primary LanguageRustApache License 2.0Apache-2.0

pod5-rs

Experimental library for interacting with POD5 files in Rust.

Motivation: Dataframes for POD5 files

Major goals here are to explore combining POD5 files with polars DataFrame API. A POD5 file is essentially multiple Apache Arrow files stitched together using flatbuffers. Since polars has native support for Apache Arrow files, we should be able to treat the POD5 Arrow components as Dataframes and leverage all the API provided by polars to manipulate these.

This library performs the necessary casting in order to use polars on POD5 Apache Arrow contents. It provides a few convience functions for common operations, such as decompressing signal data, converting read ids into strings, etc. However, polars is re-exported to give full access to the DataFrame API.

Roadmap

  • Convienence/examples for conversion to SLOW5/BLOW5
  • VBZ De/Compression
    • Decompression
    • Compression (works but isn't exact?)
  • DataFrame for Signal Table
    • Convert read_id into binary
    • Decompress signal data
  • DataFrame for Run Info Table
  • DataFrame for Reads Table
  • Support mmap
  • Arrow-based API
  • Optimize decompression
    • Switch Zig-zag encoding dependency
    • Try varint-rs, varint-simd, etc.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.