Making word longer?
dietercastel opened this issue · 3 comments
Now I'm playing around with Stefann a bit. Your readme was really helpful here! But I guess there's more knobs and dials to it than I expected. Was still wondering is there e.g. a way to make the word you're replacing longer? Also, your examples seem way more crips than my quick shot at it. Got any tips?
About string length
In our demo, the length of the input cannot be changed out of the box. Sorry! However, this feature is essentially GUI specific and it can be implemented by rewriting parts of stefann.py. Unfortunately, it would be quite a bit of work.
About visual quality
Your observation is correct. The released model for FANnet was trained only on uppercase letters. Most of the results shown in the paper involve uppercase letters as well. While it is possible to train a model on lowercase letters easily, we did not include it in final release. Because for uppercase letters, every letter has equal height, but for lowercase letters, individual letters can span across diferent zones (upper / middle / lower). This further complicates the target letter placement.
The aim of the paper was to propose a pipeline to address a potentially useful problem (scene text editing) and to show the feasibility and applicability of such a pipeline. At present, there are many limitations exist in STEFANN. The most common issue is unstable generation due to inadequate domain adaptation. STEFANN follows a supervised learning approach with source and target image pairs. It is extremely difficult to prepare such a dataset from scene text images. Thus, STEFANN was trained using synthetic data (Google Fonts). During inference, the quality of generation is heavily affected by noisy artifacts in segmentation stage. It takes some manual efforts to find best possible thresholds during segmentation. Even then not every image produces visually crisp result.
Here are few random examples along with respective tweaks:
Original & Edited Images | Inverse Binarization | Binarization Threshold | Minimum Allowed Contour Area |
---|---|---|---|
No | 180 | 10 | |
No | 165 | 20 | |
Yes | 110 | 25 | |
No | 110 | 10 |
Check out this video for more examples using the sample images.
About string length
Okay, I'm glad it's not required to retrain the network in full for this. Some GUI coding in python is certainly doable! If I get around doing this I'll definitely contribute upstream. :-)
Could you be more specific what code I should look at? Because I'm not yet familiar with the source code some pointers might help. :-)
About visual quality
I see. I took a, signal/noise large, example so the details are of course more visually striking. But it's good to know what the limitations are of the network.
As a workaround maybe there exists a capitalisation NN. That would probably help solve this.
For visual quality I also noticed that deskewing helps a lot.
I did deskewing with this handy deskew tool available on pip/github. It might be helpful to integrate this feature into stefann.