/torch_batcher

Serve pytorch inference requests using batching with redis for faster performance.

Primary LanguagePython

Torch Batcher

Serve batched requests using redis, can scale linearly by increasing the number of workers per device and along devices.

Dependencies

Usage

  • For Linear Scaling, start nvidia-cuda-mps-control, Check Section 2.1.1 GPU utilization for details.

    nvidia-cuda-mps-control -d # To start
    
    # To exit mps after stoping the server do.
    nvidia-cuda-mps-control # Will enter the command prompt
    quit # enter command to quit
  • Start Redis

    redis-server --save "" --appendonly no
  • Start Batch-Serving

    supervisord -c supervisor.conf # Start 3 workers on a single gpu
  • Start Batch benchmark

    python3 bench_batched.py