apache/fury

[Java] Meta compression

chaokunyang opened this issue · 1 comments

Is your feature request related to a problem? Please describe.

Meta share mode can reduce meta cost in every serialization. This will ensure multiple objects of same type write meta only once for space saving, and got better pperformance by memory copy meta binary.

But currently meta encoding is not compressed, the space cost will be larger for auto meta share mode.

For normal meta share mode, the meta will be sent only once for every peer, the space cost can be ignored. But for auto meta share, the meta will be sent every time serialziation happens.

Meta Compression Proposal

This mode will forbid streaming writing since it needs to look back for update the offset after the whole object graph
writing and mete collecting is finished.
We plan to streamline meta writing in the future

Schema consistent

Class will be encoded as an enumerated string by full class name.

Schema evolution

Class meta format:

| meta header: hash + num classes | current class meta | parent class meta | ... |

Meta header

Meta header is a 64 bits number value encoded in little endian order.

  • Lowest 4 digits 0b0000~0b1110 are used to record num classes. 0b1111 is preserved to indicate that Fury need to
    read more bytes for length using Fury unsigned int encoding. If current class doesn't has parent class, or parent
    class doesn't have fields to serialize, or we're in a context which serialize fields of current class
    only( ObjectStreamSerializer#SlotInfo is an example), num classes will be 1.
  • Other 60 bits is used to store murmur hash of flags + all layers class meta.

Single layer class meta

| enumerated class name string | unsigned int: num fields | field info: type info + field name | next field info | ... |

Type info of custom type field will be written as an one-byte flag instead of inline its meta, because the field value
may be null, and Fury can reduce this field type meta writing if object of this type is serialized to in current object
graph.

Field order are left as implementation details, which is not exposed to specification, the deserialization need to
resort fields based on Fury field comparator. In this way, fury can compute statistics for field names or types and
using a more compact encoding.

Class name will be written as an unsigned id if the class is registered.

Field name will be written as an unsigned id if the field is marked with an ID by an annotation.

Additional context

#80 #202

@bigteech JavaScript implementation can start cross-language schema compatibilty work after this issue is finished.