Investigate performing encoding in a separate thread
Breakthrough opened this issue · 3 comments
Investigate using threading
or multiprocessing
to split different parts of the processing pipeline up to make better use of multiple CPU cores. In general, video decoding followed by encoding take up the most processing time in the pipeline overall. Encoding is more CPU intensive, but typically there are less frames to encode than the input video contains, thus it consumes less overall CPU time when processing a given input video.
Just did a quick benchmark to test this with a 1080p video at 60 FPS containing roughly 3000 frames (50% of them with motion). The results I obtained were:
Scan-Only:
- Single-Threaded: 47.3 FPS
- Multi-Threaded: 54.0 FPS (+15%)
Including Video Output:
- Single-Threaded: 41.5 FPS
- Multi-Threaded: 51.6 FPS (+24%)
This is using the threading
module, not multiprocessing
(frames are too large to transfer across processes without shared memory which is only available in Python 3.8+).
I used 3 threads - one for decoding the video, one for the motion detection algorithm, and one for the video encoding. This seems to be a worth-while avenue to pursue, and also helps cleanup the control flow by better separation of concerns.
Here is the benchmark code:
https://gist.github.com/Breakthrough/8aed9a77fd8b9a60fb37e984e33ea596
If anyone would like to test this on their own system, I'd be glad to see what kind of improvement you're seeing from that script.
Interestingly, it looks like writing the same benchmark in Rust yields another worthy performance gain. I've only tested the single-threaded version of the Python benchmark, but even an unoptimized debug mode build yields 53 FPS (versus 41.5 FPS in Python). If the Rust multithreaded version proves to be significantly faster, then it might be worth considering rewriting DVR-Scan v2.0 in Rust.
Edit: Final results for Rust, including video output:
- Single-Threaded: 50.9 FPS (+23%)
- Multi-Threaded: 62.2 FPS (+50%)
This is now completed in the v1.5 branch. I'm getting roughly 60 FPS now processing a 1080p video, which is a pretty significant improvement. Feel free to grab a pre-built .whl from the AppVeyor build job to test out.