Chunked pipeline execution
Opened this issue · 0 comments
Yifei-yang7 commented
So far the query input is formulated as a single table with each column as a continuous gpu buffer, which is not viable when using libcudf with large data. Specifically, this cannot be expressed in libcudf table formant for string columns which are larger than 2G, since the offset in libcudf is int32_t instead of uint64_t used in sirius. The common practice is to chunk input table into batches less than a threshold (e.g., 2G) and process table as multiple batches. The extra high level implementation required can be:
- Adjust sirius table expression as batches.
- For each meta pipeline run multiple pipelines, one pipeline per batch.
- Add merge/partition operation for pipeline breakers such as grouped aggregation, sort, hash table build, etc.
- (optional) Ideally merge/partition operation should support input and output batches from multiple mediums like gmem/dram/ssd, which can later on further enable spill support.
More details will come.