baudm/parseq

Finetuning

Ehteshamciitwah opened this issue · 0 comments

Hello, thank you for sharing your work.

I checked parseq [32,128] pre-trained model for the custom dataset. the sample images are attached. The length of labels ranges from 3 to 20.

However, the word accuracy on the dataset using pre-trained weight is just 56. I fine-tuned your model with default parameters. but it increases to 72% only.

What is the best way to fine-tune your model for the custom dataset?

  1. Input image dimension/patch size
  2. Encoder parameters (layers, head,ratio)
  3. Decoder parameters (layers,head,ratio)
  4. decoding scheme
  5. Permutation K value
  6. Any additional recommendations?

Additionally, how can we integrate a dictionary with the parseq models? i am looking for your response.

imgD-log-0000_1_rotated_0
imgD-log-0000_2_rotated_180
imgD-log-0000_3_rotated_180
imgD-log-0003_1_rotated_0
imgD-log-11-0040_327_rotated_0