byeonghu-na/MATRN

Predict More Characters

Closed this issue · 4 comments

Hello there!

  • Great work. I'd like to ask how to train the align model with more characters. The current implementation can only recognize 36 characters (09, az). I want to recognize 90 characters (09, az, A~Z, and some symbols).
  • I tried to modify some code and now I can train on 90 characters. However, I am facing a problem that I can not load the pre-trained language model and vision model, as they are trained on 36 characters. Is there any way to modify the code so that I can load the pre-trained weights?

How many epochs can you train? After I modify the epoch, it always ends after epoch9
e0f51241df1776c05f1781607811f25

@yiren556
I didn't modify any hyperparameters and the training ended at the 9th epoch.

This is the result of my debugging, I don't understand why it stops
9aef9b27f6f2fe65b148436d0d199d4

Hi, sorry for late reply and thank you for your interest.

I think you may change charset_path in template.yaml from 'data/charset_62.text' to your own charset, and case_sensitive to True.

As you know, pre-trained language and vision models are trained on 36 characters. Therefore, you need to train these pre-trained models on 90 characters, or you may load these pre-trained models without the final classification layer, which is related to the size of character set.

I think this link is useful to ignore size mismatch problem. (Lightning-AI/pytorch-lightning#4690 (comment))
Apply this methods to loading pre-trained model, which are in below:

MATRN/modules/model.py

Lines 17 to 19 in 57b6b8e

def load(self, source, device=None, strict=True):
state = torch.load(source, map_location=device)
self.load_state_dict(state['model'], strict=strict)

self.load(config.model_language_checkpoint)

self.load(config.model_vision_checkpoint)