101arrowz/fzstd

docs: comparison to fflate

Closed this issue · 6 comments

bdon commented

Hi, thanks for the library and fflate, I've been fflate in my open source project with great success.

Is there a good head-to-head benchmark for how the single-threaded decompression time compares between fflate and fzstd for the same compressed sample data? If you think it would be a worthy addition to the docs, I can find sample datasets and attempt to write up a comparison.

Empirically fzstd is a bit slower than fflate. It's surprising but I think it's mainly because I based this implementation off of a relatively basic C implementation rather than optimizing it myself. If you would like to create actual performance comparison, I'd be happy to include it in the docs here.

bdon commented

Hi @101arrowz - I'm not confident I can come up with a representative benchmark that would show the performance difference between fzstd/fflate effectively. For now that is very useful information, so I'm going to stick with fflate instead - thanks for the great library!

@101arrowz what data did you use for the fflate benchmarks?

I did a simple benchmark, and the performance isn't great:

zstd-wasm compress: 37.023ms
fzstd decompress: 19.315ms
zstd-wasm decompress: 5.051ms
fflate compress: 11.665ms
fflate decompress: 2.197ms

zstd-wasm compress: 16.253ms
fzstd decompress: 30.854ms
zstd-wasm decompress: 0.399ms
fflate compress: 31.121ms
fflate decompress: 7.95ms

zstd-wasm compress: 15.413ms
fzstd decompress: 46.074ms
zstd-wasm decompress: 5.006ms
fflate compress: 108.39ms
fflate decompress: 62.482ms

zstd-wasm compress: 39.991ms
fzstd decompress: 236.635ms
zstd-wasm decompress: 31.952ms
fflate compress: 589.8ms
fflate decompress: 167.85ms
import * as fs from "node:fs"
import * as url from "node:url"

import { compress, decompress } from "@dweb-browser/zstd-wasm"

import * as fzstd from "../src/index"

import * as fflate from "fflate"

for (let i = 0; i < 4; i++) {
const data = "Lorem ipsum dolor sit amet".repeat(1024 * (10 ** i))
const encoded = new TextEncoder().encode(data)

console.time("zstd-wasm compress")
const compressed = compress(encoded, 18)
console.timeEnd("zstd-wasm compress")

console.time("fzstd decompress")
const decompressed = fzstd.decompress(compressed);
console.timeEnd("fzstd decompress")
const decoded = new TextDecoder().decode(decompressed)

console.time("zstd-wasm decompress")
const decompressed2 = decompress(compressed);
console.timeEnd("zstd-wasm decompress")
const decoded2 = new TextDecoder().decode(decompressed2)

const data_buf = fflate.strToU8(data)
console.time("fflate compress")
const fflate_compressed = fflate.compressSync(data_buf, { level: 9, mem: 12 })
console.timeEnd("fflate compress")

console.time("fflate decompress")
const fflate_decompressed = fflate.decompressSync(fflate_compressed)
console.timeEnd("fflate decompress")
const fflate_decoded = fflate.strFromU8(fflate_decompressed);

console.log()
//console.log(decoded, decoded2, fflate_decoded)
}

That seems about right for fzstd; this library is better used for its lightweight size than it's performance (which is still OK for small inputs). Still, the delta versus WebAssembly is way bigger than it was a few years ago - I guess WASM JIT implementations have gotten more optimized since then. That's good for the web!

I'm a bit surprised by fflate's significantly slower performance than zstd-wasm here, but I think that may have to do with this being unrealistically compressible data. Both fflate and fzstd tend to perform better on real data than extremely low-entropy or extremely high-entropy (i.e. random) data versus WebAssembly implementations. The delta should be smaller in practice and primarily due to Zstandard's better suitability for high performance implementations vs. DEFLATE.