How to inference on multiple gpus?

Question

How to inference on multiple gpus?

fungtion opened this issue 6 months ago · 5 comments

fungtion commented 6 months ago

Hi, can engine model perform inference on multiple gpus?

Answer 1 · 2024-04-22T01:46:13.000Z

Hi,

From the FAQ: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#faq

Q: How do I use TensorRT on multiple GPUs?

A: Each ICudaEngine object is bound to a specific GPU when it is instantiated, either by the builder or on deserialization. To select the GPU, use cudaSetDevice() before calling the builder or deserializing the engine. Each IExecutionContext is bound to the same GPU as the engine from which it was created. When calling execute() or enqueue(), ensure that the thread is associated with the correct device by calling cudaSetDevice() if necessary.

from: NVIDIA/TensorRT#322

Answer 2 · 2024-04-22T01:49:38.000Z

Create an inference engine instance for each gpu and set cudaSetDevice(gpu_id) for each device.

Answer 3 · 2024-04-22T01:52:59.000Z

Can you give an example? I tried to set cudaSetDevice(1), but it always works on gpu:0 not gpu:1

Answer 4 · 2024-04-22T01:57:10.000Z

Make sure to put it at the beginning of each function that is using cuda/gpu such as:

Yolov9::Yolov9(string engine_path)
{
cudaSetDevice(1);
// Read the engine file
ifstream engineStream(engine_path, ios::binary);
...

void Yolov9::predict(Mat& image, vector<Detection> &output)
{
cudaSetDevice(1);
// Preprocessing data on gpu
cuda_preprocess(image.ptr(), image.cols, image.rows, gpu_buffers[0], model_input_w, model_input_h, cuda_stream);
...

Answer 5 · 2024-04-22T02:04:27.000Z

Thanks, I will try it.