reverbrain/eblob

New headers and chunked checksums

Opened this issue · 1 comments

Introduction

Each record has header that is presented at blob and at index. Header is binary dump of eblob_disk_control object, it has fixed size and in a blob it is placed at the beginning of the record. Header contains the record meta info, sizes, position etc. If it is not disabled, each record has footer that is binary dump of eblob_disk_footer, it has also fixed size and in a blob it is placed at the end of the record. Footer contains the checksum of the record.

Problems

  1. if we will decide to extend header, we will have to convert all blobs to new header format.
  2. record checksumming depends on record size and takes a lot of time in case of huge record

Solutions

1. extendable headers

We can use msgpack with fixed fields for header serialization. In case of header extension, blobs with old header will be available for read, but all new writes will be done in new blobs with new headers. Also while defragmentation it can convert blobs with old headers.

2. checksumming of huge file

We can split file into chunks and checksums each chunk. Also we can add new record flags for records which is checksummed by chunk, escape having to convert current blobs and can convert blobs while defragmentation.

Chunked checksums are implemented in #131 and reverbrain/elliptics#629