If I create more than one model instance, does this support parallel execution on GPU?
xxxpsyduck opened this issue · 1 comments
xxxpsyduck commented
I want to create 5 instances of the same model and execute them parallelly on GPU. I tried with TVM and TensorRT but failed so I wonder if ncnn support this feature?
atanmarko commented
AFAIK for NCNN VULKAN you should be able to do that from different processes in linux, but you can not run separate inference from different threads in one process, VULKAN context would go wild. NCNN CPU should work. I have not really tested multiple parallel instances of NCNN CUDA implementation, I would expect that it should work if you run inference from separate applications (but I may be wrong).