marsupialtail/quokka

Disk based hash joins meta thread

marsupialtail opened this issue · 1 comments

@savebuffer

Steps:

  1. Add C++ build infrastructure for Pyarrow plugins.
  2. Support disk spilling via refactoring: https://github.com/apache/arrow/pull/13669/files#diff-8099df49024baabc838e5615bbf8403232678172e089828efe631b99f8adba54
  3. Modify above to keep the hash table in memory and only keep disk offsets in memory.
  4. Write C++ plugin for random row lookups in streaming disk-based hash join
  5. Test and brag.

Step 1 is done.

One more step needed is to first add bloom filters for the probe side.