[Feature Request] Utilize `encoding.BinaryMarshaller` and `encoding.BinaryUnmarshaller`
duckbrain opened this issue · 9 comments
If encoding.TextMarshaller
is used for types found with clover.NewDocumentOf
and encoding.TextUnmarshaller
with clover.Document.Unmarshal
, then standard library and custom types can be used for fields that aren't currently able to marshal/unmarshal correctly like a custom UUID type.
Hi, @duckbrain, thanks for the suggestion. However, using a textual encoding, such as encoding.TextMarshaller
, would have a severe impact on query performance, considering that, at query time, much time is wasted in document deserialization.
This is why, I switched from json
to msgpack
, which is a binary encoding and is thus more performant.
Supporting custom types is definetely a good idea, but I think this feature should not rely on some assumptions made about the underlying encoding.
In that case, encoding.Binary*
?
encoding/binary
only allows you to encode integer types. Anyway, the problem here is not about finding a better encoding for documents, since msgpack is already doing its job. Supporting custom types should not be tight to a particular encoding scheme, since it would be impossible to change it in the future. The way encoding libraries (among these, json and msgpack itself) allow the user to include custom types is by letting the user define custom Marshal()
and Unmarshal()
methods on the custom object, which serialize the custom type to byte slices.
I was referring to the encoding.BinaryMarshaller
and encoding.BinaryUnmarshaller
interfaces instead of the text-based interfaces in the same package.
They serve as a standard for encoding and decoding custom types into a binary format, not an encoding scheme, but instead can be used by encoding formats like gob.
Standard library types also implement these standards, so it could remove some special handling for certain types.
Oh, sorry about that, we were basically saying the same thing. Would you like to work on this feature?
After looking into it. msgpack is already using encoding.BinaryMarshaller and encoding.BinaryUnmarshaller, but document.Document
is normalizing everything into map[string]interface{}
and that's what is failing to handle the binary encoding.
It looks like internal.Convert
is using json
to copy values from one object to another. In general, an normal lookup into an object goes from binary encoded msgpack -> map[string]interface{}
-> json -> your object. To fix this Document would need to be able to use an object and reflect directly on it.
I don't see a way to do this without rewriting large swaths of the internals, and I'm not sure backwards compatibility can be preserved with the API.
After looking into it. msgpack is already using encoding.BinaryMarshaller and encoding.BinaryUnmarshaller, but
document.Document
is normalizing everything intomap[string]interface{}
and that's what is failing to handle the binary encoding.
I think it should be enough to leave values implementing the binary.BinaryMarshaller interface untouched when normilizing to map[string]interface{} in order to preserve custom types.
It looks like internal.Convert is using json to copy values from one object to another. In general, an normal lookup into an object goes from binary encoded msgpack -> map[string]interface{} -> json -> your object. To fix this Document would need to be able to use an object and reflect directly on it.
Yes, but this only holds if you want to convert a Document
to a custom struct object, using Unmarshal()
(I would like to optimize this in the future, by removing the json step, which is actually a way to simply convert between types).
Anyway, when documents are loaded from disk, they are simply unmarshalled from the msgpack
encoding.
I think it should be enough to leave values implementing the binary.BinaryMarshaller interface untouched when normilizing to map[string]interface{} in order to preserve custom types.
Right you are. For some reason, I was trying to Marshal them there, which was making it a lot harder than it needs to be. PR is up.
Anyway, when documents are loaded from disk, they are simply unmarshalled from the msgpack encoding.
I see how that works and the need for the Document type for building the indexes and having a way to pass the values around, but don't personally see when I would get a document out of the database and not immediately convert it into a type-safe object.
I would like to optimize this in the future, by removing the json step, which is actually a way to simply convert between types
The "simple" way to handle that of course is custom reflection or something like structmap.
On the idea of optimizing for what I see as the common case. I messed around with the idea deferring the unmarshalling until requested or needed to use Get
and the like. That's is not completely functional, nor suggested, but you're welcome to peruse/use.
https://github.com/duckbrain/clover/tree/feature/delay-decoding
Thank you for the PR, which I just merged in the v2 branch.
The "simple" way to handle that of course is custom reflection or something like structmap.
On the idea of optimizing for what I see as the common case. I messed around with the idea deferring the unmarshalling until requested or needed to use Get and the like. That's is not completely functional, nor suggested, but you're welcome to peruse/use.
https://github.com/duckbrain/clover/tree/feature/delay-decoding
These are interesting ideas, which I think should be handled in a separate issue.
I think this one is solved, so I'm closing it