UDOP - Fine tuning with bad metrics

Question

UDOP - Fine tuning with bad metrics

arvisioncode opened this issue a month ago · 2 comments

I have obtained a finetune model in funds following the steps in your notebook, the only change introduced is the model base: "microsoft/udop-large-512-300k"

Train configuration:

training_args = TrainingArguments(output_dir="test",
                                  max_steps=3000,
                                  warmup_ratio=0.1,
                                  per_device_train_batch_size=1,
                                  per_device_eval_batch_size=1,
                                  gradient_accumulation_steps=8,
                                  eval_accumulation_steps=8,
                                  learning_rate=5e-5,
                                  evaluation_strategy="steps",
                                  eval_steps=100,
                                  load_best_model_at_end=True,
                                  metric_for_best_model="f1")

The results that I obtained in the end and the generated model have the following characteristics:

This model is a fine-tuned version of on FUNSD dataset. It achieves the following results on the evaluation set:

Loss: 1.3328
Precision: 0.8664
Recall: 0.8775
F1: 0.8719
Accuracy: 0.8085

However, in the UDOP paper it is specified that the metrics of this model trained in FUNSD should be:

How can I reach those precision values?

Do you advise me to change any parameters of the model? Should I increase the number of epochs a lot?

Thank you so much

Answer 1 · 2024-05-07T12:09:59.000Z

Hi,

Thanks for your interset in UDOP! Note that the metrics they show in the paper are for UdopForConditionalGeneration, not for UdopEncoderModel. Hence they leverage the encoder-decoder (generative) model for fine-tuning on FUNSD, which can also be found in my demo notebook.

Answer 2 · 2024-05-09T09:31:12.000Z

Hi @NielsRogge and thank you very much for your work!

I have also tried to fine-tune the microsoft/udop-large-512-300k model following the steps in demo notebook, using the default configuration. However, the output results have similar metrics around 0.8 accuracy.

Can you give us any advice to improve these workouts? Wouldn't it be possible to achieve the results we saw in the paper with this notebook?