SABER-labs/torch_batcher

Serve pytorch inference requests using batching with redis for faster performance.

Python

Torch Batcher

Serve batched requests using redis, can scale linearly by increasing the number of workers per device and along devices.

Dependencies

Install Redis
pip3 install -r requriments.txt

Usage

For Linear Scaling, start nvidia-cuda-mps-control, Check Section 2.1.1 GPU utilization for details.

nvidia-cuda-mps-control -d # To start

# To exit mps after stoping the server do.
nvidia-cuda-mps-control # Will enter the command prompt
quit # enter command to quit

Start Redis
```
redis-server --save "" --appendonly no
```

Start Batch-Serving

supervisord -c supervisor.conf # Start 3 workers on a single gpu

Start Batch benchmark
```
python3 bench_batched.py
```