Low GPU usage / Performance
acsignal opened this issue · 1 comments
I am running fitting on your demo script using the command:
python -m deepvog --fit ./demo.mp4 ./demo_eyeball_model.json -m -b 32
I've seen that in your paper (Section 3.1.5 - Inference Speed) you run your program at 130Hz for batch sizes of 32, however when I run your program on your demo files (even without visualisation) I am averaging around 15Hz.
I am using a machine with the following specs:
CPU - Intel Xeon 12-core 2.5Ghz w/ Windows 10
GPU - Nvidia GeForce RTX 2080 Ti
RAM - 64GB
Python - 3.6.1
Tensorflow-gpu - 1.15.0
CUDA - 10.0
cuDNN - 7.6.5
Is there anything obvious that I am missing here that could lead to the weak performance?
Thanks for the message! The speed test in section 3.1.5 in the paper refers to the inference speed of the FCNN, i.e. the frequency at which segmentations can be computed during the forward pass of the neural net. We stated that in the caption of Figure 8, but you are right, in the text section, it reads as if "DeepVOG" as a whole achieves this speed, which is too optimistic. DeepVOG has overheads (even without visualisation), e.g. loading of libraries and models, decoding of video frames with ffmpeg, eyeball model fitting, writing of results, and unloading of all models and data. Admittingly, the package as a whole leaves much room for optimization of code, most importantly, reading the video with ffmpeg in parallel through a subprocess, ideally in form of a keras data generator. If you are familiar with these kinds of optimizations and would like to contribute, we would warmly welcome this. Some of these optimizations are already on our agenda, but won't be implemented before the new year.
To be fair though, I am not sure, why your setup only yielded such a low framerate, especially given your GPU, which is much more powerful than the ones we are using. The inference speeds reported in the paper were achieved with a 1080 Ti. I just ran the demo.mp4 video on my laptop (GTX 1070 mobile, Intel i7 4-core, 32GB RAM), and even here, I achieve 23.2 fps - including all above-mentioned overheads (no visualization either though). The demo.mp4 video has a relative small number of frames (1112 frames). On longer videos (e.g. I tried one with 5867 frames), the proportion of the overhead sinks, and I can even achieve slightly higher framerates (28.8 fps, on the longer video). Still not comparable to the actual inference speed of the FCNN (which I haven't measured on my machine yet), but higher than 15 Hz. I am not sure what is "wrong" there though (if anything)? Maybe it's a driver issue? Given your specs and code setup, I do not see anything that's obviously "wrong", sorry...