apache/datafusion-python

Bindings to datafusion-proto

leoyvens opened this issue · 4 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to serialize a logical plan to the datafusion-proto format, from Python.

Describe the solution you'd like
Add Python bindings to datafusion-proto, particularly my use case would require the logical_plan_to_bytes function.

Describe alternatives you've considered
Use Substrait instead of datafusion-proto, however Substrait parity with datafusion-proto is not there yet as tracked in apache/datafusion#8149.

This sounds like a very good addition. Do you think we would also want to expose python protobuf objects as well? I suspect that might increase the complexity of our build, so if it isn't going to be a significant benefit then just exposing the serialization / deserialization sounds straightforward.

Thank you for considering this for the next release!

For my case I wouldn't need the protobuf-generated types exposed. However logical_plan_to_bytes_with_extension_codec and logical_plan_from_bytes_with_extension_codec would be very nice to have, though they are a bit more complex than logical_plan_to_bytes.

This turns out to be more difficult than I expected. Right now, none of the LogicalExtensionCodec implementations implement try_encode or try_decode so when I tested this I'm not getting anything useful out of it. I'll see what other options we have, but we might have to implement something addition. I don't know if this will make it into DF42 since it's more work than I had originally anticipated.

I discovered my problem limited to the cases of tables that are created in memory, so I am pushing up the PR to expose this feature with that caveat.