New headers and chunked checksums
Opened this issue · 1 comments
Introduction
Each record has header that is presented at blob and at index. Header is binary dump of eblob_disk_control
object, it has fixed size and in a blob it is placed at the beginning of the record. Header contains the record meta info, sizes, position etc. If it is not disabled, each record has footer that is binary dump of eblob_disk_footer
, it has also fixed size and in a blob it is placed at the end of the record. Footer contains the checksum of the record.
Problems
- if we will decide to extend header, we will have to convert all blobs to new header format.
- record checksumming depends on record size and takes a lot of time in case of huge record
Solutions
1. extendable headers
We can use msgpack with fixed fields for header serialization. In case of header extension, blobs with old header will be available for read, but all new writes will be done in new blobs with new headers. Also while defragmentation it can convert blobs with old headers.
2. checksumming of huge file
We can split file into chunks and checksums each chunk. Also we can add new record flags for records which is checksummed by chunk, escape having to convert current blobs and can convert blobs while defragmentation.
Chunked checksums are implemented in #131 and reverbrain/elliptics#629