Baseline - Unifying Vision, Text, and Layout for Universal Document Processing(CVPR 2023 Highlight)
The goal of the 2023 MIRIDIH Corporate Collaboration Project is to utilise MIRIDIH's dataset to recommend and generate an optimal design layout for a given user's query. This repository contains the code responsible for generating the design layout, specifically to generate bounding box tokens for each sentence in the query. This repository contains the code responsible for following tasks:
- Preprocess raw XML data to extract text, image and layout data
- Finetune Vision-Text-Layout(VTL) Transformer, based on Universal Document Processing (UDOP) model for design layout generation.
- Inference VTL model to generate bounding box tokens for each sentence in the query
The Vision-Text-Layout Transformer is based on UDOP, a model proposed in Unifying Vision, Text, and Layout for Universal Document Processing (CVPR 2023 Highlight). UDOP is a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. We have modified and fine-tuned the UDOP model for design layout generation. The model architecture is as follows:
The repository has two branches:
main
branch contains the code for customizing HuggingFace's implmentation of the UDOP modelcustom
branch contains the code for customizing Microsoft's implmentation of the UDOP model
.
├── LICENSE
├── README.md
├── config/ # Train/Inference configuration files
│ ├── inference.yaml
│ ├── predict.yaml
│ └── train.yaml
├── core/ # Main UDOP/DataClass source code
│ ├── common/
│ ├── datasets/
│ ├── models/
│ └── trainers/
├── data/ # Custom dataset folder
│ ├── images/
│ │ └── image_{idx}.png
│ └── json_data/
│ └── processed_{idx}.pickle
├── main.py # Main script to run training/inference
├── models # Trained models saved to this folder
├── sweep.py # Script to run hyperparameter sweep
├── test # Save visualizations during inference
├── requirements.txt
├── udop-unimodel-large-224 # Pretrained UDOP 224 model
│ ├── config.json
│ ├── pytorch_model.bin
│ ├── special_tokens_map.json
│ ├── spiece.model
│ └── tokenizer_config.json
├── udop-unimodel-large-512 # Pretrained UDOP 512 model
│ ├── config.json
│ ├── pytorch_model.bin
│ ├── special_tokens_map.json
│ ├── spiece.model
│ └── tokenizer_config.json
└── utils # Utilities
conda create -n UDOP python=3.8 # You can also use other environment.
pip install -r requirements.txt
Setup folder structures as above and modify config/ yaml files for customization
python main.py config/train.yaml
python main.py config/inference.yaml