Disk based hash joins meta thread
marsupialtail opened this issue · 1 comments
marsupialtail commented
@savebuffer
Steps:
- Add C++ build infrastructure for Pyarrow plugins.
- Support disk spilling via refactoring: https://github.com/apache/arrow/pull/13669/files#diff-8099df49024baabc838e5615bbf8403232678172e089828efe631b99f8adba54
- Modify above to keep the hash table in memory and only keep disk offsets in memory.
- Write C++ plugin for random row lookups in streaming disk-based hash join
- Test and brag.
marsupialtail commented
Step 1 is done.
One more step needed is to first add bloom filters for the probe side.