Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal
Code Release Here
Code is rehosted at part of the i-code project
- Release Model (Encoder + Text decoder)
- Release Most Scripts
- Vision Decoder / Weights (Due to fake document generation ethical consideration, we plan to release this functionality as an Azure API)
- Demos
UDOP unifies vision, text, and layout through vision-text-layout Transformer and unified generative pretraining tasks including vision task, text task, layout task, and mixed task. We show the task prompts (left) and task targets (right) for all self-supervised objectives (joint text-layout reconstruction, visual text recognition, layout modeling, and masked autoencoding) and two example supervised objectives (question answering and layout analysis).