Exploring Stroke-Level Modifications for Scene Text Editing

Introduction

This is a pytorch implementation for paper MOSTEL. It edits scene text at stroke level and can be trained using both labeled synthetic images and unpaired real scene text images.

ToDo List

Installation

Requirements

Python==3.7
Pytorch==1.7.1
CUDA==10.1

https://github.com/qqqyd/MOSTEL.git
cd MOSTEL/

conda create --name MOSTEL python=3.7 -y
conda activate MOSTEL
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.7/index.html
pip install -r requirements.txt

Training

Prepare the datasets and put them in datasets/. Our training data uses synthetic data generated by SRNet-Datagen and real scene text datasets. You can download our datasets here(password: t6bq) or OneDrive(password: t6bq).

To get better performance, Background Reconstruction Module can be pre-trained on SCUT-EnsText, and recognizer can be pre-trained on 50k synthetic data generated by SRNet-Datagen. You can also use our models(password: 85b5) or OneDrive(password: 85b5).

python train.py --config configs/mostel-train.py

Testing and evaluation

Prepare the models and put them in models/. You can download our models here(password: 85b5) or OneDrive(password: 85b5).

Generating the predicted results using following commands:

python predict.py --config configs/mostel-train.py --input_dir datasets/evaluation/Tamper-Syn2k/i_s/ --save_dir results-syn2k --checkpoint models/mostel.pth --slm
python predict.py --config configs/mostel-train.py --input_dir datasets/evaluation/Tamper-Scene/i_s/ --save_dir results-scene --checkpoint models/mostel.pth --slm

For synthetic data, the evaluation metrics are MSE, PSNR, SSIM and FID.

python evaluation.py --gt_path datasets/evaluation/Tamper-Syn2k/t_f/ --target_path results-syn2k/

For real data, the evaluation metric is recognition accuracy.

python eval_real.py --saved_model models/TPS-ResNet-BiLSTM-Attn.pth --gt_file datasets/evaluation/Tamper-Scene/i_t.txt --image_folder results-scene/

Or you can use eval_2k.sh and eval_scene.sh for testing and evaluation.

bash eval_2k.sh configs/mostel-train.py models/mostel.pth
bash eval_scene.sh configs/mostel-train.py models/mostel.pth

In our experiments, we found that SLM will improve the quantitative performance while leaving some text outline traces, which is not good for visualization. You can add --dilate for better visualization when generating predicted results.

Citing the related works

If you find our method useful for your research, please cite

@inproceedings{qu2023exploring,
  title={Exploring stroke-level modifications for scene text editing},
  author={Qu, Yadong and Tan, Qingfeng and Xie, Hongtao and Xu, Jianjun and Wang, Yuxin and Zhang, Yongdong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={37},
  number={2},
  pages={2119--2127},
  year={2023}
}

References

Niwhskal/SRNet

youdao-ai/SRNet-Datagen

clovaai/deep-text-recognition-benchmark

HoAnhKhoaVN/22C15033_MOSTEL