Rework Kafka -> COPY path
derekjn opened this issue · 2 comments
After pipelinedb/pipelinedb#1596 we can't rely on copy_iter_hook
to pass messages into COPY
. Some approaches we can use instead:
- Write batches to temp file(s) and pass paths to
COPY
(simple but potentially slow) - Project rows and write directly to queues via ZMQ (fast but duplicates parsing/deserialization logic that
COPY
already performs)
The second approach is probably ideal, as COPY
deserialization logic is fairly straightforward and not likely to ever change.
What about mmap'd files?
What about mmap'd files?
I don't think there's any guarantee that they'd be faster than regular disk-backed files. Unless I'm mistaken, mmap
maps addresses on disk to memory addresses, but doesn't necessarily guarantee that all of the file's contents are kept in memory.
That being said, if we go with the first option we'd want to use mmap
. I just don't think it's fundamentally different performance-wise, especially since we'd just be doing sequential writes to the temp file.