SIMDflate

This is an experimental speed-oriented DEFLATE compression implementation (with zlib/gzip wrapper support) written largely using AVX-512 instructions. It aims to be faster than other DEFLATE compressors at the expense of some drawbacks:

The implementation is focused on text compression only
Design restricts achievable compression, meaning it’s only comparable with fastest compression levels on existing implementations
Current implementation doesn’t support any speed/size tradeoff options
Requires an x86-64 CPU with “Ice Lake level” AVX-512 support
Design is not easily portable to other ISAs

This code serves more as a demonstration of what can be achieved if we disregard compatibility concerns, and perhaps act as a showcase of what can be done with AVX-512. The limitations means that this isn’t really a general-purpose compressor, but this might be improved.
It's currently very early in development, which means that it isn’t well geared for production use, lacks features/functionality, likely has bugs (not extensively tested), code poorly documented etc.

Non-goals of this project

Achieving ‘maximum’ or high compression
Consistent or reproducible output across all CPUs (i.e. different features/techniques may be enabled/used depending on CPU/arch)
Platform portability
Decompression support

Required AVX-512 support / Compatible CPUs

This implementation makes extensive use of relatively new AVX-512 instructions introduced in Intel’s Ice Lake microarchitecture. In fact, it uses all AVX-512 subsets supported on Ice Lake (as well as the BMI1 and BMI2 instruction sets) except DQ and VAES, or put another way, it uses the following AVX-512 subsets: F, BW, CD, VL, VBMI, VBMI2, BITALG, VPOPCNTDQ, GFNI, VPCLMULQDQ, IFMA, VNNI.

Don’t worry if the above is confusing - the following is a list of compatible processors at time of writing:

Intel Ice Lake, Tiger Lake and Rocket Lake
- 10th generation Core mobile ‘G’ processors
- 11th generation Core processors
- Xeon Scalable 3rd generation (non-Cooperlake), including workstation class Xeons
Intel desktop Alder Lake (12th generation Core) may have unofficial support
- Expected to be unavailable on Raptor Lake (13th gen Core) and probably Meteor Lake (14th gen Core)
- Likely will be available on Alder Lake-X and later Intel high-end desktop or workstation platforms
Intel Sapphire Rapids (4th gen Xeon Scalable) or later P-core based Xeons
- Considering Intel's continuing to port EVEX instructions to VEX for Sierra Forest (Crestmont cores?), it's likely unsupported on future E-core based Xeons
AMD Zen 4 (most Ryzen 7000 and 4th generation EPYC processors, including Zen 4c variants) or later
- As these weren’t available at the time of writing, SIMDflate has been primarily developed/optimised on Intel’s AVX-512 implementation

SIMDflate is not compatible with AVX-512 implemented on Skylake/Cascadelake/Cooperlake, Cannonlake or CNS.

zingaburga/SIMDflate

SIMDflate

Non-goals of this project

Required AVX-512 support / Compatible CPUs