risinglightdb/risinglight

storage: support range-filter scan by sort key

Closed this issue · 0 comments

skyzh commented

Our storage engine already supports sorting by sort key. To leverage the property in storage, we can provide a range_scan interface on storage, so as to support range filter scan for executors.

To support range_scan, we should be able to start scanning from a sort key in storage engine. This requires two steps:

  • record sort key in block index
  • support create iterator by sort key
  • add interface for executors' use

Currently, the first_key field is empty in block builder. We should record first_key in order to support seek. Note that #588 will add composite sort key support. For this issue, we may only support range filter scan by the first element in composite sort key.

pub fn finish_block(
&mut self,
block_type: BlockType,
column_data: &mut Vec<u8>,
block_data: &mut Vec<u8>,
stats: Vec<BlockStatistics>,
) {
self.indexes.push(BlockIndex {
offset: column_data.len() as u64,
length: block_data.len() as u64 + BLOCK_HEADER_SIZE as u64,
first_rowid: self.last_row_count as u32,
row_count: (self.row_count - self.last_row_count) as u32,
/// TODO(chi): support sort key
first_key: "".into(),
stats,
});

After that, we should be able to support create RowSetIterator by sort key.

pub async fn new(
rowset: Arc<DiskRowset>,
column_refs: Arc<[StorageColumnRef]>,
dvs: Vec<Arc<DeleteVector>>,
seek_pos: ColumnSeekPosition,
expr: Option<BoundExpr>,
) -> StorageResult<Self> {

The interface should have two new fields start_key: Option<Vec<u8>> and end_key: Option<Vec<u8>>. RowSetIterator should return rows only in this range. At the same time, the RowSetIterator should directly seek to the block of start_key instead of scanning from the beginning.

When everything's ready, we can finally expose range-filter scan interface to the compute layer. This requires us to modify:

fn scan<'a>(
&'a self,
begin_sort_key: Option<&'a [u8]>,
end_sort_key: Option<&'a [u8]>,
col_idx: &'a [StorageColumnRef],
is_sorted: bool,
reversed: bool,
expr: Option<BoundExpr>,
) -> Self::ScanResultFuture<'a> {