GrzegorzBlok/FastRsyncNet

New version of Adler32?

Closed this issue · 5 comments

First of all, thanks for a really great library!

I just noticed that Octodiff has recently integrated a new version of Adler32 that, according to the commit, is more standard. They have also marked the old implementation as obsolete.

Should this commit perhaps me cherry-picked into FastRsyncNet as well?

Hi, I'm glad you find the library useful.
I'm well aware of the commit made to the Octodiff regarding Adler32 implementation. In fact it was made due to the issue that I submitted :) You may see it here : OctopusDeploy#16 for more explanation regarding the Adler32 standard variant. I'm going to introduce standard Adler32 implementation, soon. It's going to be available along with the current one.
I want to keep the non-standard implementation due to backward compatibility reasons, but also it's faster and it's little "wider" in terms of number of bits effectively used.

Thanks for the added info - I don't know how I missed the link to the original issue. :)

I have ported Adler32 implementation from Octodiff:
fb12437

Note that the "new" Adler32 implementation is slower than the previous one:

BenchmarkDotNet=v0.10.10, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.608)
Processor=Intel Core i7-4700MQ CPU 2.40GHz (Haswell), ProcessorCount=8
Frequency=2338333 Hz, Resolution=427.6551 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2558.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2558.0


                            Method |      N |           Mean |         Error |        StdDev |
---------------------------------- |------- |---------------:|--------------:|--------------:|
   Adler32RollingCalculateChecksum |    128 |       127.4 ns |     0.6079 ns |     0.5687 ns |
 Adler32RollingV2CalculateChecksum |    128 |       609.3 ns |     2.0996 ns |     1.9640 ns |
   Adler32RollingCalculateChecksum |  16974 |    15,324.5 ns |    65.0046 ns |    57.6249 ns |
 Adler32RollingV2CalculateChecksum |  16974 |    80,504.9 ns |   355.8678 ns |   332.8789 ns |
   Adler32RollingCalculateChecksum | 356879 |   321,294.6 ns | 1,666.7223 ns | 1,477.5058 ns |
 Adler32RollingV2CalculateChecksum | 356879 | 1,699,248.1 ns | 8,053.0523 ns | 7,532.8294 ns |

Ouch - that's a pretty substantial perf hit. Is there any reason to use the new implementation in an application where interoperability is not an issue?

Adler32 is just one element of whole rsync algorithm. You may want to take a look at the signature benchmark when evaluating rolling checksum performance effect (see SignaturexxHash vs SignaturexxHashAdler32V2). Benchmark is done on the data that resides in memory. When you run rsync in real scenario you usually deal with file system or network streams. This further decreases the importance of the algorithm efficiency.
Anyway, I think that it's not the optimal implementation of standard Adler32 algorithm and will try to improve it.

                   Method | BaseFileSize | ChunkSize |         Mean |       Error |      StdDev |
------------------------- |------------- |---------- |-------------:|------------:|------------:|
          SignaturexxHash |          128 |       128 |     8.915 us |   0.0533 us |   0.0416 us |
 SignaturexxHashAdler32V2 |          128 |       128 |     9.673 us |   0.0543 us |   0.0481 us |
            SignatureSha1 |          128 |       128 |     9.821 us |   0.0925 us |   0.0865 us |
             SignatureMd5 |          128 |       128 |     9.926 us |   0.1115 us |   0.0988 us |
  OctodiffSignaturexxHash |          128 |       128 |     1.294 us |   0.0134 us |   0.0125 us |
          SignaturexxHash |          128 |      2048 |     8.506 us |   0.0984 us |   0.0920 us |
 SignaturexxHashAdler32V2 |          128 |      2048 |     8.981 us |   0.0836 us |   0.0741 us |
            SignatureSha1 |          128 |      2048 |    10.334 us |   0.1097 us |   0.1026 us |
             SignatureMd5 |          128 |      2048 |    10.159 us |   0.0874 us |   0.0730 us |
  OctodiffSignaturexxHash |          128 |      2048 |     1.293 us |   0.0112 us |   0.0105 us |
          SignaturexxHash |          128 |     31744 |    10.620 us |   0.2051 us |   0.3007 us |
 SignaturexxHashAdler32V2 |          128 |     31744 |     9.805 us |   0.1925 us |   0.2060 us |
            SignatureSha1 |          128 |     31744 |    12.067 us |   0.1285 us |   0.1003 us |
             SignatureMd5 |          128 |     31744 |    12.099 us |   0.2566 us |   0.3761 us |
  OctodiffSignaturexxHash |          128 |     31744 |     1.296 us |   0.0105 us |   0.0099 us |
          SignaturexxHash |        16974 |       128 |   140.336 us |   1.2767 us |   1.1943 us |
 SignaturexxHashAdler32V2 |        16974 |       128 |   170.283 us |   2.1461 us |   2.0075 us |
            SignatureSha1 |        16974 |       128 |   341.534 us |   3.6022 us |   3.3695 us |
             SignatureMd5 |        16974 |       128 |   336.902 us |   1.0130 us |   0.9475 us |
  OctodiffSignaturexxHash |        16974 |       128 |    52.790 us |   0.1959 us |   0.1736 us |
          SignaturexxHash |        16974 |      2048 |   100.568 us |   0.6387 us |   0.5333 us |
 SignaturexxHashAdler32V2 |        16974 |      2048 |   165.552 us |   0.5237 us |   0.4373 us |
            SignatureSha1 |        16974 |      2048 |   117.439 us |   0.7930 us |   0.7418 us |
             SignatureMd5 |        16974 |      2048 |   115.743 us |   0.8604 us |   0.7628 us |
  OctodiffSignaturexxHash |        16974 |      2048 |    53.134 us |   0.2733 us |   0.2282 us |
          SignaturexxHash |        16974 |     31744 |   100.423 us |   1.1249 us |   0.9972 us |
 SignaturexxHashAdler32V2 |        16974 |     31744 |   166.975 us |   1.5149 us |   1.4170 us |
            SignatureSha1 |        16974 |     31744 |   104.409 us |   0.5123 us |   0.4541 us |
             SignatureMd5 |        16974 |     31744 |   103.962 us |   0.2075 us |   0.1839 us |
  OctodiffSignaturexxHash |        16974 |     31744 |    52.787 us |   0.1750 us |   0.1552 us |
          SignaturexxHash |       356879 |       128 | 2,777.521 us |  15.6666 us |  14.6545 us |
 SignaturexxHashAdler32V2 |       356879 |       128 | 3,301.038 us |  17.7195 us |  15.7079 us |
            SignatureSha1 |       356879 |       128 | 7,100.220 us | 134.8899 us | 126.1761 us |
             SignatureMd5 |       356879 |       128 | 6,790.532 us |  41.6502 us |  38.9596 us |
  OctodiffSignaturexxHash |       356879 |       128 | 1,095.267 us |   4.9712 us |   4.1512 us |
          SignaturexxHash |       356879 |      2048 | 1,929.465 us |  23.7443 us |  22.2104 us |
 SignaturexxHashAdler32V2 |       356879 |      2048 | 3,303.896 us |  19.1901 us |  17.9504 us |
            SignatureSha1 |       356879 |      2048 | 2,235.923 us |  16.1581 us |  14.3238 us |
             SignatureMd5 |       356879 |      2048 | 2,223.528 us |   7.4972 us |   6.6461 us |
  OctodiffSignaturexxHash |       356879 |      2048 | 1,115.721 us |  22.2082 us |  21.8114 us |
          SignaturexxHash |       356879 |     31744 | 1,884.238 us |   8.6153 us |   8.0588 us |
 SignaturexxHashAdler32V2 |       356879 |     31744 | 3,296.442 us |  19.7002 us |  18.4276 us |
            SignatureSha1 |       356879 |     31744 | 1,949.209 us |  28.2938 us |  26.4660 us |
             SignatureMd5 |       356879 |     31744 | 1,951.113 us |   9.1978 us |   8.1536 us |
  OctodiffSignaturexxHash |       356879 |     31744 | 1,099.746 us |   8.9334 us |   7.9192 us |