Why diff is not 0 when zone and file share the same lifetime?
Closed this issue · 9 comments
Line 490 in 22f110f
Hi, guys! Recently I have read some source code of ZenFS. I can get the core idea of allocating zones for sst. But what I can't follow is why diff is not 0 when zone and file share the same lifetime?
Hope for some teaching! Thanks in advance!
Hi @Haltz - the allocator is design to prioritize filling up zones with data that has a shorter life span than of the data already written in the zone to ensure that the new data wont prolong the reclaim time. Only in cases where we don't have any choice (the active zone limit has been reached) we pick zones with LIFETIME_DIFF_COULD_BE_WORSE.
Thang you for explaining @yhr. But I'm not sure I understand the benefit of this design, so I hope for further teaching.
For SSTs with Env::WLTH_SHORT
, the allocator always allocates an empty zone if possible for them because the diff returned is LIFETIME_DIFF_COULD_BE_WORSE
. If SSTs with Env::WLTH_SHORT
consumes all active zones, zenfs would have to write SST with different WLTH to one selected active zone. Is there a chance that SSTs with various lifetime are mixed in a zone, making it's hard to achieve " the new data wont prolong the reclaim time"?
@Haltz: ZenFS will co-locate WLTH_SHORT data if all active zones are used (due to the allocator returning LIFETIME_DIFF_COULD_BE_WORSE). If zenfs can't find a decent life time match(NOT_GOOD) it will finish one of the active zones to achieve data separation.
Sorry, I will check my github more frequently. But now I still have a question. What do you mean by "co-locate WLTH_SHORT data"? Like transferring WLTH_SHORT ssts from other zones to one zone at once(I did not find the related code, please point me out if I'm wrong)? Or opening an empty zone by finishing an active zone and during GC zenfs can put them together?
So from your words can I assume that if the active zone limit has been reached, the data will find itself a zone with the most close lifetime because LIFETIME_NOT_GOOD can only be returned when file lifetime is none or not set? In this case, SSTables from different levels will share the same zone because they always has a LIFETIME(> None or NOT_SET). The data seperation is not good I think.
Thank you for being patient! I think we can close the issue now.