NielsRogge/Transformers-Tutorials

UDOP different models

arvisioncode opened this issue · 2 comments

Good morning @NielsRogge !

As I understand it, the UDOP model can be used for different tasks such as docvqa, classification or information extraction.
Looking at the notebooks you have on this algorithm, in the inference one I see that the hf model is defined: microsoft/udop-large, and is used for question-answering tasks.

My question would be, are there pretrained UDOP models for different tasks? I haven't found them on hugging face

I have seen that in the nb a prompt is for classifying the image... but I understand that there should be another specific model for this task? Is there that model or another one?

Thank you so much

Hi,

Microsoft released 3 pre-trained UDOP models: https://huggingface.co/collections/microsoft/udop-65e625124aee97415b88b513. They were all pre-trained in a general way, to be fine-tuned for tasks like docvqa, classification or information extraction. The best performing model is microsoft/udop-large-512-300k since it uses the highest image resolution (512x512) and is pre-trained the longest.

Perfect! Thank you very much for your response!
I have seen that you have added new notebooks for training in different tasks, starting from those base models.
Would it be possible for you to create a new one to fine-tune in docvqa?