Serve multiple models with both CPU and GPU

Question

Serve multiple models with both CPU and GPU

hungtrieu07 opened this issue 6 months ago · 3 comments

Hi guys, I have a question: Can I serve several models (about 5 - 6 models) using both CPU and GPU inference?

Answer 1 · 2024-04-10T16:14:40.000Z

Hi @hungtrieu07 Yes, TorchServe supports multi model endpoints

You can refer to this
https://github.com/pytorch/serve/pull/3040/files#diff-b70d3a47c15879d308451b54821682f1d63518db732881b434c4110d9ca7a767R44

Answer 2 · 2024-04-12T09:21:34.000Z

Hi @hungtrieu07 Yes, TorchServe supports multi model endpoints

You can refer to this https://github.com/pytorch/serve/pull/3040/files#diff-b70d3a47c15879d308451b54821682f1d63518db732881b434c4110d9ca7a767R44

Hi @agunapal, I'm coding an python app using PyQT5, like an surveilance camera app. My pipeline look like this:
The program reads frames from RTSP link or video file ---> send frame to Inference API using requests python lib ---> get results from Torchserve API responses ---> process the response results (draw bounding box on frame) ---> convert processed frame from numpy array to QPixMap to display on app.

In each camera, I have 2 queues: 1 for stores the original frames, 1 for stores the processed frames. But it too slow, take about 4-5 seconds for 1 frame processed. What strategy I can use in this situation?

Answer 3 · 2024-04-12T15:54:52.000Z

Hi @hungtrieu07 I would start off with couple of things

profile your handler to see where is the most time spent and then correct the issue https://github.com/pytorch/serve/tree/master/examples/benchmarking/resnet50
Send asynchronous requests. If you have multiple workers, they can process frames in parallel
Use batching
https://github.com/pytorch/serve/tree/master/examples/image_classifier/near_real_time_video#on-premise-near-real-time-video-inference