multi-threading decompression

Question

multi-threading decompression

YoelShoshan opened this issue 5 years ago · 1 comments

Hi!
Thanks for sharing this repo :)

I'm wondering - since this is thread safe, is there an example of using multi-threading to decode a single file faster ?
(I assume that it is done decoding each part separately using each thread worker)

Answer 1 · 2020-02-24T18:08:16.000Z

@YoelShoshan, you're welcome and thank you for the question.

lz4 (the underlying C library) doesn't currently support multi-threaded compression (unless I'm mistaken), i.e. you cannot use multiple threads to speed it up. You'd have to yourself split the stream/file up to into multiple parts, e.g. something along the lines of this.
However, you can use lz4framed to decompress multiple streams/files in parallel and see an improvement (since the Python GIL is unlocked) if the files are large enough.

Since by design lz4 format is very fast at decoding, you're unlikely to see very large gains for decompression. I made a couple of gists to illustrate this: One which (a) keeps the result in memory and another which (b) simply counts the size of the output.

With a local test file I got the following results (using a 500MB file multiple times via symbolic links and the "keep result in memory" gist from above):

Operation	Threads	Time taken (seconds)
Compress	1	3.06
Compress	4	1.61
Decompress	1	3.59
Decompress	4	3.19

So in summary:

The methods are thread safe as rightly pointed out (though that mainly applies to the low-level methods only)
The compression & decompression are multi-threadable in that the Python GIL will be unlocked for large enough input data chunks. How much benefit this brings depends on what Python-level operations are performed with the resulting data