pinterest/elixir-thrift

Support CompactBinaryProtocol

Opened this issue · 7 comments

Anyone thought about this? I'm doing some stuff with Parquet and that uses the compact protocol under the hood.

I'd have to dig deeper into the protocol, but from what I remember it's mostly the same as the binary protocol except some types are encoded differently, and some fields may not have fixed width. My swagger idea for implementation would be to use the binary protocol code generation and make modifications on top of that as needed.

l may take a stab at this at some point if no one else claims it.

Hi @dantswain, have you made any progress on this?

No, I haven't had time :( I'm still very interested in it.

We're very interested in it also. @dantswain do you have thoughts on how you would add this in?

The protocol is mostly the same as the binary protocol, so my idea was to try to use the code generation that we already use for the binary protocol and make modifications as necessary to support the various differences. It may need a two-pass sort of approach to handle the zig-zag encoding? If you have existing code that can generate compact protocol binaries, that would be super helpful bc you can use that to generate test cases.

If you want more info on how the code generation works, I'd be glad to chat about it.

Thank you. I might take you up on that. Currently, Java is putting messages onto the bus so we have binary stream example to use/test with. We're trying to get an Elixir project off the ground and this is one stumbling block.

Hey, so I've made a start at looking at how this might be done. It's nowhere near ready for a PR here, but I've made this to make it easier to follow the approach and get feedback. CultivateHQ#1 .

(I've got as far as serialising the test Bool struct. https://github.com/CultivateHQ/elixir-thrift/blob/compact-protocol/test/thrift/generator/compact_protocol_test.exs#L57-L82)

I'm not entirely sure about translating this test / behaviour to the compact protocol:

assert_serializes %I16{val: 65536}, <<6, 0, 1, 0, 0, 0>>, %I16{val: 0}

The binary version wraps i16 overflows on encoding, so 32768 becomes -32768 and 65536 is 0.

Should the compact protocol do the same, even though it's encoded in a zigzag varint?

I'd say yes, except that the Ruby implementation ignores the size constraint. (The Python range checks and errors if the number is outwith the -32768 to 32768).