ulikunitz/xz

Equivalent of FastBytes?

zfLQ2qx2 opened this issue · 2 comments

The 7zip SDK has something they call "FastBytes" which seems to be a mechanism to limit how much time is spent looking for the best sequence to add to the dictionary. I don't see an equivalent here, how do you get around that?

In the current release i have a constant limit, maxMatches=16, that limits the amount of the number of positions searched for. (I add also a number of short distances, which from my recent experience was misguided.) You have to look into lzma/hashtable.go file to investigate the implementation. Note I wrote the code without a lot of experience in the field.

I have done a lot of experiments on that recently. What I found is that a hash to a linear list as in the current implementation, doesn't provide a lot better compression than 2 hashes with different input lengths, but it is much slower. Right now I have very fast implementations, where the whole search mechanism is done in a single loop without function calls, but this code can't reach the compression rates of the original xz. I'm currently working on a tree implementation that can compete with the bt4 match finder of the original xz implementation. I have also added parallel compression and decompression modes. But I want to achieve the compression rates of xz before the next release.

I assume that the answer has been comprehensive.