This repository is the implementation of "CTRNet++: Dual-path Learning with Local-global Context Modeling for Scene Text Removal", which is published on TOMM 2024. paper
The training and inference codes are available.
For any questions, please email to me at liuchongyu1996@gmail.com. Thank you for your interest.
My environment can be refered as follows:
- Python 3.8.11
- PyTorch 1.8.0
- Polygon
- shapely
- skimage
We use SCUT-EnsText and SCUT-Syn.
All the images are set to 512 * 512. The strucuture images for LCG block are generated by the official code in RTV methods. You can generate the data yourselves, and we will also provide the test data here. data.
After downloading the dataset, you can directly place the folders as
data/
--SCUT-ENS
----train
------image/*.jpg
------label/*.jpg
------mask/*.jpg
------gt/*.txt
----test
------image/*.jpg
------label/*.jpg
------mask/*.jpg
------gt/*.txt
...
The mask can be generated using the gt by OpenCV (cv2.drawContours
), the example are shown in here.
Create an new directory (./pretrained/
) and place the pretrain weights for FFC-based inpainting model--LaMa, VGG-16, and our pretrain model for structure generator. All of them are available at here. You can also retrain the structure generator yoursellf.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=8942 --use_env \
main.py \
--train_dataset scutens_train \
--val_dataset scutens_test \
--data_root ./data/ \
--output_dir ./checkpoint/ \
--batch_size 4 \
--lr 0.0005 \
--num_workers 8 \
--code_dir . \
--epochs 300 \
--save_interval 10 \
--warmup_epochs 10 \
--dataset_file erase \
--rotate_max_angle 10 \
--rotate_prob 0.3 \
--crop_min_ratio 0.7 \
--crop_max_ratio 1.0 \
--crop_prob 1.0 \
--pixel_embed_dim 512 \
--train
For generating the results with text removal, the commond is as follows:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=8941 --use_env \
main.py \
--train_dataset scutens_train \
--val_dataset scutens_test \
--data_root ./data/ \
--output_dir ./checkpoint/ \
--batch_size 1 \
--num_workers 0 \
--code_dir . \
--dataset_file erase \
--eval \
--resume your/checkpoint/file
The repository is benefit a lot from DETR, Lama, SPL, ResTormer, and CTSDG. Thanks a lot for their excellent work.
If you find our method or dataset useful for your reserach, please cite:
@article{CTRNetpp,
author = {Liu, Chongyu and Peng, Dezhi and Liu, Yuliang and Jin, Lianwen},
title = {CTRNet++: Dual-path Learning with Local-global Context Modeling for Scene Text Removal},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1551-6857},
url = {https://doi.org/10.1145/3697837},
doi = {10.1145/3697837},
note = {Just Accepted},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
month = oct,
keywords = {Scene Text Removal, Context Guidance, Dual-path Learning}
}
Suggestions and opinions of our work (both positive and negative) are welcome. Please contact the authors by sending email to Chongyu Liu(liuchongyu1996@gmail.com). For commercial usage, please contact Prof. Lianwen Jin via (eelwjin@scut.edu.cn).