LayoutLM - Deployment

Repo to test conversion of SOTA layoutlm model conversion and deployment

How it works

  • To convert for trition usage, the model file must be either one of 2 formats in this case the base model is in pytorch, to use it in deployment, it must first be in this format:
    • Libtorch
    • ONNX

This tutorial focuses on converting it to onnx, and it will use on the SEQ Labelling Task, the other will be on hold for now

NOTE: The tokenizer that comes with it cannot be convert to onnx, therefore it has to be seperated into 2 components

Model Download

Model Name Link
LayoutLM-Seq Labelling ONNX https://drive.google.com/file/d/1AC8zF2Ic395ybMyXzCmi0xbO7zj0VxFo/view

How to run

Run using triton inference server

  1. Clone the repo
git clone https://github.com/edwin-19/LayoutLm-Deployment.git

1.b) Pull the nvidia triton docker image, you can change 21.07 to any other version

  • Depending on your cuda version you need to check for compatibility here
docker pull nvcr.io/nvidia/tritonserver:21.07-py3 # pulling latest at the moment
  1. Download and preprocess the data
./preprocess.sh
  1. Convert available model to onnx
python convert_to_onnx.py

Note here you can move over to the triton client by specifying the specific path

  1. Run Trition Server
  • Remember to change the -v to point to the path of your triton server
  • When using volume, you have to specify absolute path
# Run on CPU
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/mnt/882eb9b0-1111-4b8f-bfc3-bb89bc24c050/pytorch/layoutlm/deployment:/model nvcr.io/nvidia/tritonserver:21.07-py3 tritonserver --model-repository=/model 

# Or with GPU
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/mnt/882eb9b0-1111-4b8f-bfc3-bb89bc24c050/pytorch/layoutlm/deployment:/model nvcr.io/nvidia/tritonserver:21.07-py3 tritonserver --model-repository=/model 
  1. Run test script
  • Runs and specify localhost:8000
python test_trion_client.py

NOTES:

  • Please follow the structure as such
  • The model name must be "model" and whatever format specified
  • If you have a new version please add the number specifying that version
deployment/
└── layoutlm_onnx
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

Run using onnxruntime

The other option is to test directly with a python script

Note: This option uses random to randomly generate the numbers therefore it does not work very well

  1. First download or convert the model to onnx

  2. Run the following python script

python demo_onnx_runtime.py

TODO

  • Data parser
  • Create inference script for demo for script
  • Convert model from torch to onnx
  • Load model to trition inference server
  • Write triton inference script for deployment
  • Write onnxruntime infer script

TODO Clean up

  • Write py file for conversion
  • Write py file for deployment

Future Improvement

  • Wrap up in a fastapi/flask for more complete repo
  • Classification Model to ONNX
  • Set to deploy in trition

Reference