- How to clone
- Start Services
- Stop services
- Testing API
- Perf_analyzer 📈 results
- ⚡Triton server PORT details
- Some useful requests
- ☢️ Important
☠️ Make sure that submodule is also cloned, otherwise do it manually converter/yolov5
git clone --recursive https://github.com/TalhaUsuf/yolov5-triton.git
cd yolov5-triton
git lfs pull
- 💀
docker
/docker compose
must be installed on the system - 💀 nvidia gpu must be present
docker-compose up -d --build
App will be running 🏃♂️on http://localhost:8005/
docker-compose down
if following model is used for testing then class names don't need to be specified
http://localhost:8005/
- Sample Images:
- Sample Model:
ensemble-part | Concurrency | Inferences/Second | Client Send | Network+Server Send/Recv | Server Queue | Server Compute Input | Server Compute Infer | Server Compute Output | Client Recv | p50 latency | p90 latency | p95 latency | p99 latency |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Yolo-postprocess-B1 | 1 | 7.4983 | 3649 | 4738 | 348 | 1275 | 122293 | 182 | 7 | 143654 | 158760 | 163193 | 172620 |
Yolo-postprocess-B2 | 1 | 6.7764 | 7379 | 15042 | 358 | 2204 | 272046 | 1183 | 10 | 310661 | 354668 | 357366 | 374222 |
Yolo-detection-B1 | 1 | 20.8297 | 1642 | 3318 | 91 | 659 | 34114 | 1857 | 6253 | 38402 | 100416 | 101235 | 124060 |
Yolo-detection-B2 | 1 | 25.8853 | 3685 | 6090 | 123 | 1313 | 42735 | 9961 | 13117 | 67164 | 108679 | 130117 | 149820 |
Service | Port |
---|---|
GRPC InferenceService | 0.0.0.0:8001 |
HTTPS Service | 0.0.0.0:8000 |
Metrics Service | 0.0.0.0:8002 |
- 🟩 Get yolov5 model config:
<host>
is the url where the triton server is running<version>
is the version of the model
curl --location --request GET 'http://<host>:8000/v2/models/yolov5/versions/<version>/config'
- 🟩 Get inference ready models:
<host>
is the url where the triton server is running
curl --location --request POST 'http://<host>:8000/v2/repository/index' \
--header 'Content-Type: application/json' \
--data-raw '{
"ready" : true
}'
- model version mode is set to latest 1 so only the latest model will be loaded
- ensemble model will use the latest version of the model for inference