AutoMQ/automq

Compaction limits the number of blocks in the final object to not exceed Integer.MAX_VALUE.

Closed this issue · 1 comments

Who is this for and what problem do they have today?

Why is solving this problem impactful?

The format uses int32 to record the number of data blocks.
=>
Index block's length should also be less than Integer.MAX_VALUE, so the data blocks count should be less than Integer.MAX_VALUE / (40 block index size) = 53,687,091.
=>
Considering the memory consumption of compaction and object reading when indexing, limit the data blocks count of a single object to 100,000 (index block size 3.8MiB).

=>

  • Consider stream object compaction.
  • Consider Single-machine 5000 Partition Scenario for Stream Set Object Compaction Compaction Index Memory Usage. 5000 partitions, 100000 data blocks, it only takes 3 minutes to fill up.

=>

  • Perhaps introducing data block merging functionality can alleviate this issue.

Additional notes

Generate 5,000 stream objects, 5,000 of them, magnify by 3 times, Controller Image + Broker Imager + Controller processing layer.
15,000

  • s3streamobject occupies 1MiB

  • s3object occupies 1.5MiB

  • S3StreamsMetadataImage occupies 1.5MiB

  • S3ObjectsImage 1.3MiB

1PB 数据预计元数据占用空间 1 * 1024 * 1024 / 10 / 5000 * 3 = 62MB