/SIMDflate

Experimental speed-oriented DEFLATE implementation, based on AVX-512

Primary LanguageC++MIT LicenseMIT

SIMDflate

This is an experimental speed-oriented DEFLATE compression implementation (with zlib/gzip wrapper support) written largely using AVX-512 instructions. It aims to be faster than other DEFLATE compressors at the expense of some drawbacks:

  • The implementation is focused on text compression only
  • Design restricts achievable compression, meaning it’s only comparable with fastest compression levels on existing implementations
  • Current implementation doesn’t support any speed/size tradeoff options
  • Requires an x86-64 CPU with Ice Lake level” AVX-512 support
  • Design is not easily portable to other ISAs

This code serves more as a demonstration of what can be achieved if we disregard compatibility concerns, and perhaps act as a showcase of what can be done with AVX-512. The limitations means that this isn’t really a general-purpose compressor, but this might be improved.
It's currently very early in development, which means that it isn’t well geared for production use, lacks features/functionality, likely has bugs (not extensively tested), code poorly documented etc.

Non-goals of this project

  • Achieving ‘maximum’ or high compression
  • Consistent or reproducible output across all CPUs (i.e. different features/techniques may be enabled/used depending on CPU/arch)
  • Platform portability
  • Decompression support

Required AVX-512 support / Compatible CPUs

This implementation makes extensive use of relatively new AVX-512 instructions introduced in Intel’s Ice Lake microarchitecture. In fact, it uses all AVX-512 subsets supported on Ice Lake (as well as the BMI1 and BMI2 instruction sets) except DQ and VAES, or put another way, it uses the following AVX-512 subsets: F, BW, CD, VL, VBMI, VBMI2, BITALG, VPOPCNTDQ, GFNI, VPCLMULQDQ, IFMA, VNNI.

Don’t worry if the above is confusing - the following is a list of compatible processors at time of writing:

SIMDflate is not compatible with AVX-512 implemented on Skylake/Cascadelake/Cooperlake, Cannonlake or CNS.