Bindings to datafusion-proto
leoyvens opened this issue · 4 comments
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to serialize a logical plan to the datafusion-proto format, from Python.
Describe the solution you'd like
Add Python bindings to datafusion-proto, particularly my use case would require the logical_plan_to_bytes function.
Describe alternatives you've considered
Use Substrait instead of datafusion-proto, however Substrait parity with datafusion-proto is not there yet as tracked in apache/datafusion#8149.
This sounds like a very good addition. Do you think we would also want to expose python protobuf objects as well? I suspect that might increase the complexity of our build, so if it isn't going to be a significant benefit then just exposing the serialization / deserialization sounds straightforward.
Thank you for considering this for the next release!
For my case I wouldn't need the protobuf-generated types exposed. However logical_plan_to_bytes_with_extension_codec
and logical_plan_from_bytes_with_extension_codec
would be very nice to have, though they are a bit more complex than logical_plan_to_bytes
.
This turns out to be more difficult than I expected. Right now, none of the LogicalExtensionCodec
implementations implement try_encode
or try_decode
so when I tested this I'm not getting anything useful out of it. I'll see what other options we have, but we might have to implement something addition. I don't know if this will make it into DF42 since it's more work than I had originally anticipated.
I discovered my problem limited to the cases of tables that are created in memory, so I am pushing up the PR to expose this feature with that caveat.