facebook/zstd

[Question] Decompression is ~3x slower than compression?

DUOLabs333 opened this issue · 7 comments

I'm investigating switching from lz4 to zstd as the compression algorithm used in one of my projects (at its core, I'm sending compressed strings between two computers). It works fine, but while profiling my program with perf --callgraph dwarf, I noticed that decompression takes up ~15% of the CPU time, while compression takes roughly 5% of the time (I have noticed a similar asymmetry with lz4, but it was closer to 2x). Is this expected?

Nope, it's not expected.

Are you sure the measured compression and decompression have the same workload?

What do you mean by "workload" --- every time the client sends a compressed string, the server must decompress it, and vice-versa.

Many applications have way more decompression events than compression events.
It depends on the application, and can't be guessed.

Sorry, I'm not sure I understand what you mean by "event" --- is this calls to the respective API-facing functions, or calls to internal compression/decompression routines?

@Cyan4973 is asking whether the volume of compression and decompression is the same. It sounds like the answer is yes: every byte of traffic is compressed once and decompressed once in your use case, yeah? This isn't true for lots of use cases--many are write once and read many, which can produce the skewed cpu time you're observing.

When you say compression is 5% of the workload and decompression is 15%, are these comparable overall workloads? Is it possible that the process that is doing the compression is doing a lot more work in general and so the compression is a smaller overall share despite being more work in absolute terms than decompression?

Or are these processes running on significantly different hardware? Or are they being compiled differently (one might be a debug build while the other is an optimized build)?

Oh, ok, --- in that case, I didn't think about it before, but it is possible that when the server sends back data to the client, it sends more data than when the client sends the server data (I didn't profile the server side, only the client --- the client is a Linux VM, and the server is a Macbook running Big Sur).

Thinking about this further, this is most likely the reason.