Repo to test conversion of SOTA layoutlm model conversion and deployment
- To convert for trition usage, the model file must be either one of 2 formats in this case the base model is in pytorch, to use it in deployment, it must first be in this format:
- Libtorch
- ONNX
This tutorial focuses on converting it to onnx, and it will use on the SEQ Labelling Task, the other will be on hold for now
NOTE: The tokenizer that comes with it cannot be convert to onnx, therefore it has to be seperated into 2 components
Model Name | Link |
---|---|
LayoutLM-Seq Labelling ONNX | https://drive.google.com/file/d/1AC8zF2Ic395ybMyXzCmi0xbO7zj0VxFo/view |
- Clone the repo
git clone https://github.com/edwin-19/LayoutLm-Deployment.git
1.b) Pull the nvidia triton docker image, you can change 21.07 to any other version
- Depending on your cuda version you need to check for compatibility here
docker pull nvcr.io/nvidia/tritonserver:21.07-py3 # pulling latest at the moment
- Download and preprocess the data
./preprocess.sh
- Convert available model to onnx
python convert_to_onnx.py
Note here you can move over to the triton client by specifying the specific path
- Run Trition Server
- Remember to change the -v to point to the path of your triton server
- When using volume, you have to specify absolute path
# Run on CPU
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/mnt/882eb9b0-1111-4b8f-bfc3-bb89bc24c050/pytorch/layoutlm/deployment:/model nvcr.io/nvidia/tritonserver:21.07-py3 tritonserver --model-repository=/model
# Or with GPU
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/mnt/882eb9b0-1111-4b8f-bfc3-bb89bc24c050/pytorch/layoutlm/deployment:/model nvcr.io/nvidia/tritonserver:21.07-py3 tritonserver --model-repository=/model
- Run test script
- Runs and specify localhost:8000
python test_trion_client.py
NOTES:
- Please follow the structure as such
- The model name must be "model" and whatever format specified
- If you have a new version please add the number specifying that version
deployment/
└── layoutlm_onnx
├── 1
│ └── model.onnx
└── config.pbtxt
The other option is to test directly with a python script
Note: This option uses random to randomly generate the numbers therefore it does not work very well
-
First download or convert the model to onnx
-
Run the following python script
python demo_onnx_runtime.py
- Data parser
- Create inference script for demo for script
- Convert model from torch to onnx
- Load model to trition inference server
- Write triton inference script for deployment
- Write onnxruntime infer script
- Write py file for conversion
- Write py file for deployment
- Wrap up in a fastapi/flask for more complete repo
- Classification Model to ONNX
- Set to deploy in trition
- You can check the original REPO code here