linkedin/avro-util

Allow user-specified functional transforms to be applied during deserialization.

j-tyler opened this issue · 0 comments

As an extension of issue 397, allow users to specify functional transforms that, ideally, will be applied during deserialization. This allows users to specify what is most-optimal for their use case. Why is this valuable?

  1. Records may be retrieved from data storage full of low cardinality duplicate strings. Interning these straight into the record can save memory usage.
  2. Records that end up cached should ideally use immutable collections. Allowing user to do this on deserialization can save memory allocation overhead.
  3. As specified in issue 397, utilization of specific libraries like fastutil could be a simple deserialization transform that the user configures.
    ...

I'm sure there are other reasons users would have. Currently my project takes Avro records and re-processes them after deserialization leading to wasted memory allocation overhead.