marbl/harvest-tools

Protocol buffer size limit

Closed this issue · 1 comments

Messages cannot be greater than 2GB (before compression) due to the implementation of Protocol Buffers. This limit is hit with large alignments because Gingr files rely on zlib compression after serialization. There are several potential solutions:

  • Reduce the serialized size by storing only variant rows rather than entire columns (also enables row-based filters)
  • Write multiple protobuf messages to each Gingr file
  • Use another serialization format, such as HDF5

Fixed (using Cap'n Proto) in v1.2.