pytorch/serve

Serve multiple models with both CPU and GPU

hungtrieu07 opened this issue · 3 comments

Hi guys, I have a question: Can I serve several models (about 5 - 6 models) using both CPU and GPU inference?

Hi @hungtrieu07 Yes, TorchServe supports multi model endpoints

You can refer to this https://github.com/pytorch/serve/pull/3040/files#diff-b70d3a47c15879d308451b54821682f1d63518db732881b434c4110d9ca7a767R44

Hi @agunapal, I'm coding an python app using PyQT5, like an surveilance camera app. My pipeline look like this:
The program reads frames from RTSP link or video file ---> send frame to Inference API using requests python lib ---> get results from Torchserve API responses ---> process the response results (draw bounding box on frame) ---> convert processed frame from numpy array to QPixMap to display on app.

In each camera, I have 2 queues: 1 for stores the original frames, 1 for stores the processed frames. But it too slow, take about 4-5 seconds for 1 frame processed. What strategy I can use in this situation?

Hi @hungtrieu07 I would start off with couple of things