The integer compressor should compress multiple integer sequences first
KarlSchimpf opened this issue · 0 comments
KarlSchimpf commented
The current compressor is somewhat complex because it treats singleton integer patterns the same was as multiple integer sequence patterns.
The problem is that singletons have considerable less savings because they are only being replaced by an abbreviation value. Hence, it may "shrink" the width (slightly), but doesn't remove values from the stream (as multiple integer sequence patterns do).
We should first schedule multiple integer sequences first. Then we should chose which of the remaining singletons should be converted to a pattern. This does two things:
- It allows us to still encode single integers using abbreviations (size based on frequency use), and
the assignment of Huffman encoding values can be merged with multiple integer sequences. - It simplifies the selection of multiple integer sequence patterns.