Serialization (asBytes) compared with Java implementation
Closed this issue · 3 comments
It looks like original Java implementation writes 2 doubles after the encoding code. But Go port have nothing like that. Am I right to assume that those 2 encodings are incompatible and there is nothing to do about it?
Hello!
You're right to assume that a serialized MergingDigest is incompatible with the current go implementation. However the AVLTreeDigest deserializes properly (for the smallbytes version) as you can see here. The fact that both java implementations have diverged without a bump in the version is likely unintentional.
And there's always many things that can be done about it, just requires code 😄 . I haven't had a need to implement anything other than that (and no PR has come with better/different implementations) so that's why it is how it is. Are you looking for something in particular? Is java-go compatibility something you deem useful?
You can also roll your own serialization using ForEarchCentroid
and custom deserialize anything by adding the centroids from the payload with AddWeighted
.
I am looking into accepting t-digests from different languages (so I don't have to send raw numbers / data) and was hoping that t-digest has some standard binary format. It looks like it had but original implementation diverged recently (which I did not know). Looks like Ruby uses compatible format though.
Thanks for the response and sharing this wonderful library
It's not necessarily a recent change, but it's more recent than when I wrote the first version of this library. I assume this min/max was introduced to account for a bug in the old mergingdigest implementation.
Relying on the encoding header is going to be problematic, so my recommendation would be to roll your own shim to make sure all languages you want to support write the same format- I don't expect the java version to sync up soon- the author seems swamped with $stuff, so much that the paper describes code that is not the current one and the current one on master
is not the most recent implementation (to my knowledge it's on branch issue-84
). If you have better ideas on how to make this all compatible without breaking things I'm happy to collaborate, serialization is something I haven't looked at seriously at all for this code.