khipu-io/khipu

Batch store records to kesque - reduce log file size massively

dcaoyuan opened this issue · 4 comments

Batch store records to kesque - reduce log file size massively

Kafka store timestamp as timestampDelta with a batched records header to reduce Long in bytes (a var size long encoding). Thus, we need to build MemoryRecords as much messages as possible to get the records header covering as much messages as possible.

We can refer Kafka producer's batch messages processing:

  • maintains a ConcurrentMap<TopicPartition, Deque>
  • gets Deque of relevant TopicPartition
  • gets last RecordBatch present in Deque, if RecordBatch(bytebuffer bounded by batch.size) is not full appends value to the RecordBatch
  • if last RecordBatch is null, no RecordBatch exists for the relevant topic partition hence allocates a new byte buffer
  • does a double check locking on last RecordBatch again, incase some other thread might have created the RecordBatch
  • if RecordBatch exists, tries appending the value
  • if still RecordBatch is null, creates MemoryRecords (backed by byte buffer)
  • adds MemoryRecords to RecordBatch
  • appends value to RecordBatch ( inside MemoryRecords eventually Byte Buffer )
  • adds RecordBatch to Deque

Refer:

  • org.apache.kafka.clients.producer.KafkaProducer
  • org.apache.kafka.clients.producer.internals.Sender
  • org.apache.kafka.clients.producer.internals.RecordAccumulator
  • org.apache.kafka.clients.producer.internals.ProducerBatch

The state nodes in Khipu will be appended batched, when ever in fasy/regular sync. Since storage/account/receipts and their index log files are the biggest, thus what I need is to implemented batch appending storage/account/receipts kvs.

Before (at block 6285787)

23G	.khipu.eth/leveldb
457M	.khipu.eth/kesque.logs/receipts_idx-0
654M	.khipu.eth/kesque.logs/evmcode-0
655M	.khipu.eth/kesque.logs/td-0
50G	.khipu.eth/kesque.logs/body-0
3.8G	.khipu.eth/kesque.logs/header-0
13G	.khipu.eth/kesque.logs/account-0
457M	.khipu.eth/kesque.logs/td_idx-0
25G	.khipu.eth/kesque.logs/storage-0
38G	.khipu.eth/kesque.logs/receipts-0
4.4G	.khipu.eth/kesque.logs/account_idx-0
12G	.khipu.eth/kesque.logs/storage_idx-0
457M	.khipu.eth/kesque.logs/body_idx-0
262M	.khipu.eth/kesque.logs/evmcode_idx-0
457M	.khipu.eth/kesque.logs/header_idx-0
147G	.khipu.eth/kesque.logs
4.0K	.khipu.eth/keystore
169G	.khipu.eth/

After (at block 6314587)

23G	.khipu.eth/leveldb
459M	.khipu.eth/kesque.logs/receipts_idx-0
664M	.khipu.eth/kesque.logs/evmcode-0
658M	.khipu.eth/kesque.logs/td-0
50G	.khipu.eth/kesque.logs/body-0
3.8G	.khipu.eth/kesque.logs/header-0
9.2G	.khipu.eth/kesque.logs/account-0
459M	.khipu.eth/kesque.logs/td_idx-0
17G	.khipu.eth/kesque.logs/storage-0
39G	.khipu.eth/kesque.logs/receipts-0
949M	.khipu.eth/kesque.logs/account_idx-0
2.4G	.khipu.eth/kesque.logs/storage_idx-0
459M	.khipu.eth/kesque.logs/body_idx-0
14M	.khipu.eth/kesque.logs/evmcode_idx-0
459M	.khipu.eth/kesque.logs/header_idx-0
124G	.khipu.eth/kesque.logs
4.0K	.khipu.eth/keystore
147G	.khipu.eth/