Triton ships a generated(based upon the protobuf definitions), which can be found here. It should be possible to also generate a http client based upon the proto definitions.
In this case we will utilize the model developed in the previous blog post Trition with post and pre processing.
Since the go grpc client is not exported and instead have to be generated, I have copied it inside here to make it easier. When trying to generate the client the code seems to be out of date and not working any longer but made a PR that is not yet reviewed with suggested changes. For now I will assume we have it generated locally.
Get the example image:
wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
Th
go run main.go
This error seems to happen when you dont have enough memory for the automatic down cast to work:
[08/17/2022-06:53:37] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:368: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Make sure your graphics card is not low on memory and ofc big enough for your model.
pip install numpy pillow torchvision opencv-python bson
apt-get update
apt-get install python3-opencv
to run the client
docker run --net=host -it -v $(pwd):/home gocv/opencv /bin/bash
STRING