Compaction limits the number of blocks in the final object to not exceed Integer.MAX_VALUE.
Closed this issue · 1 comments
superhx commented
Who is this for and what problem do they have today?
Why is solving this problem impactful?
The format uses int32 to record the number of data blocks.
=>
Index block's length should also be less than Integer.MAX_VALUE, so the data blocks count should be less than Integer.MAX_VALUE / (40 block index size) = 53,687,091.
=>
Considering the memory consumption of compaction and object reading when indexing, limit the data blocks count of a single object to 100,000 (index block size 3.8MiB).
=>
- Consider stream object compaction.
- Consider Single-machine 5000 Partition Scenario for Stream Set Object Compaction Compaction Index Memory Usage. 5000 partitions, 100000 data blocks, it only takes 3 minutes to fill up.
=>
- Perhaps introducing data block merging functionality can alleviate this issue.
Additional notes
superhx commented
Generate 5,000 stream objects, 5,000 of them, magnify by 3 times, Controller Image + Broker Imager + Controller processing layer.
15,000
-
s3streamobject occupies 1MiB
-
s3object occupies 1.5MiB
-
S3StreamsMetadataImage occupies 1.5MiB
-
S3ObjectsImage 1.3MiB
1PB 数据预计元数据占用空间 1 * 1024 * 1024 / 10 / 5000 * 3 = 62MB