Encoders based on [Bytes]IO instead of string joining?

Question

Encoders based on [Bytes]IO instead of string joining?

janpipek opened this issue 4 years ago · 0 comments

Hi Ilya,

after some time, I finally got some time to follow up on the streaming support we introduced last year (in #174 ). I started thinking how to best inject a BinaryIO into the encoding project. I can give it a try but before changing any code, I realized one (rather positive) thing:

For huge objects, encoding is much less problematic than decoding. There is some necessary copying of strings on each level of the structure but on the lower levels, only very small strings are copied, theoretically making the total asymptotic time something like O(total_size * depth_of_hierarchy) instead of O(totalsize^2) (which was the case of decoder).

However in practice, with a toy example, I was not able to even make it worse than linear - Sequence of sequences of 1000x1000 objects cost ~ the same as one sequence of 1,000,000 objects and that cost ~100 times more than one sequence of 10,000 objects. Perhaps some time might be spared by using one output stream but on the expense of passing this object all around in the encoding process. What are your thoughts?

Best regards,
Jan