Text-Line Detection
Opened this issue · 6 comments
@mayank-git-hub How can I train a text-line detection model?
Can you add documentation of training a text-line detection model.
How can I train a text-line detection model?
I believe training a text-line detection model using CRAFT would be a bad idea. The affinity heat-map would need to extend from a end character of one word to the start character of another word which maybe difficult for the model(Considering the SynthText dataset in which two words are sometimes in the same line but quite far away, if the dataset has less spacing between words then CRAFT might be a good choice) (Remember - Everything I just said is a hypothesis).
If you really need to try, you would need to make very small changes to the code(only dataloader) to make the detector work as a text-line detector instead of a text-word detector. You will need to change the affinity to not only being drawn between two characters of a single words but all the characters which are in the same line. No changes required to the character heatmap generation.
One other way can be(using a word-detector model), I have tried this out and this works, Use DBSCAN for clustering the paragraphs first in the image, use angle, and the co-ordinates of the bbox to cluster. Then using the average angle of each paragraph, do a perspective transform to make the paragraph horizontal, then do a clustering(k-means or DBSCAN) on all the bbox y co-ordinates. You will get all the words which belong to the same line.
Commenting
I am in the process of making the weak-supervision training part of the code. As soon as that part of the code starts working I am going to spend time on making the code readable and document it.
Sorry for the delay!
@mayank-git-hub hmmm....
When testing https://demo.ocr.clova.ai which is CRAFT-based, I used this image, it detected the lines.
My initial idea was to use something like this example, will it work out-of-the-box?
Note:
I hope that you include the options of:
- Predicting multiple images in a folder.
--input_folder
- To extract the detected boxes.
Also, can you share more information on the training-data, because the files that you posted in #2 have a structure of:
.
├── image.png
├── pred_affinity.png
├── pred_affinity_thresh.png
├── pred_characters.png
├── pred_characters_thresh.png
├── target_affinity.png
└── target_characters.png
My goal is to create a text-line detection model, can't I use the structure of image & single label?
Or will it be a nightmare to create the groundtruth?
Also, can you share more information on the training-data, because the files that you posted in #2 have a structure of:
. ├── image.png ├── pred_affinity.png ├── pred_affinity_thresh.png ├── pred_characters.png ├── pred_characters_thresh.png ├── target_affinity.png └── target_characters.png
My goal is to create a text-line detection model, will it be a nightmare to create the groundtruth?
This structure is for generating the examples in between training. Not the way the dataset was stored.
To create text-line detection model using craft would require a bbox corresponding to each sentence and the text written inside it. That should allow you to train a model using weak supervision which detects text-line
I will also document how to train your a model on your own dataset using the above pipeline once I am done with Weak Supervision part of CRAFT.
@mayank-git-hub hmmm....
When testing https://demo.ocr.clova.ai which is CRAFT-based, I used this image, it detected the lines.
My initial idea was to use something like this example, will it work out-of-the-box?
Note:
I hope that you include the options of:* Predicting multiple images in a folder. `--input_folder` * To **extract** the detected boxes.
In the first image, the model is detecting not text-lines but text words as the below lines might have been treated as single word in the dataset. If it was detecting lines then TSUKI W^Z^ should have been in the same line. You would need 4 co-ordinates corresponding to each sentence, and I cannot tell you if it will work out of the box but I don't think there is any reason it should not considering the space between two words is very small in your sample.
Note:
Would surely add the predict_multiple option once the weak supervision part of the code has been done. Would also return the output of word-bbox instead of character bbox. I am trying to build this as fast as possible.
You can use the predict multiple option now -
python main.py train_synth --mode=synthesize --model=/path/to/model --folder=/path/to/images
I have updated the ReadMe.md file as well.
Have commented the code as well.
Will add the predict line functionality instead of word-bbox as well once the weak-supervision part has been stabilized.