bgshih/crnn

Could you give me some suggestions for training dataset?

yangxiuwu opened this issue · 1 comments

Hi, I have trained a Chinese OCR model by CRNN ( 300W synth text image as train dataset). but the model has poor result for the real scene. So could you give me some suggestions for training dataset:
Does the dataset require a fixed aspect ratio?
Does the dataset need some data augment, e.g. transform , blur, different font color and diverse background and so on?

Hi, is your 300W synth text image data public? I'm working on receipt ocr now but my data hasn't been enough.