Sort Rows by Series when flushing to disk
cyriltovena opened this issue · 2 comments
Currently we don't sort across row groups of profiles when flushing all of them.
I believe we should be able to stream all row groups from
phlare/pkg/phlaredb/profile_store.go
Line 360 in b67c811
This will improve data locality a ton, but I am a bit unsure how this will impact querying, as the order of querying will be:
- Timestamp first then SeriesID
And blocks will be strictly stored in
- SeriesID first then Timestamp
Currently the sorting is more like:
- Within a single row group strictly: Series first then Timestamp
- Across row groups, loosely timestamp ordered
If we query a time range only impacting a part of the ~3 hours within a block, we could get away by only reading pages that fall within the time ranges (based on the pages Min/Max). With this change we can only access pages by their SeriesIDs min/max. I don't expect a major issue, I am just wondering.
Fairly relevant to the change in #799
Maybe we need to address our query behaviour to use the order that is in the blocks as well.