/CTRNet-plus

The official implement of CTRNet++.

Primary LanguagePythonApache License 2.0Apache-2.0

CTRNet++

This repository is the implementation of "CTRNet++: Dual-path Learning with Local-global Context Modeling for Scene Text Removal", which is published on TOMM 2024. paper

The training and inference codes are available.

For any questions, please email to me at liuchongyu1996@gmail.com. Thank you for your interest.

Environment

My environment can be refered as follows:

  • Python 3.8.11
  • PyTorch 1.8.0
  • Polygon
  • shapely
  • skimage

Datasets

We use SCUT-EnsText and SCUT-Syn.

All the images are set to 512 * 512. The strucuture images for LCG block are generated by the official code in RTV methods. You can generate the data yourselves, and we will also provide the test data here. data.

After downloading the dataset, you can directly place the folders as

data/
--SCUT-ENS
----train
------image/*.jpg
------label/*.jpg
------mask/*.jpg
------gt/*.txt

----test
------image/*.jpg
------label/*.jpg
------mask/*.jpg
------gt/*.txt
...

The mask can be generated using the gt by OpenCV (cv2.drawContours), the example are shown in here.

Training

Create an new directory (./pretrained/) and place the pretrain weights for FFC-based inpainting model--LaMa, VGG-16, and our pretrain model for structure generator. All of them are available at here. You can also retrain the structure generator yoursellf.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=8942 --use_env \
    main.py \
    --train_dataset scutens_train \
    --val_dataset scutens_test \
    --data_root ./data/ \
    --output_dir ./checkpoint/ \
    --batch_size 4 \
    --lr 0.0005 \
    --num_workers 8 \
    --code_dir . \
    --epochs 300 \
    --save_interval 10 \
    --warmup_epochs 10 \
    --dataset_file erase \
    --rotate_max_angle 10 \
    --rotate_prob 0.3 \
    --crop_min_ratio 0.7 \
    --crop_max_ratio 1.0 \
    --crop_prob 1.0 \
    --pixel_embed_dim 512 \
    --train     

Testing

For generating the results with text removal, the commond is as follows:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=8941 --use_env \
    main.py \
    --train_dataset scutens_train \
    --val_dataset scutens_test \
    --data_root ./data/ \
    --output_dir ./checkpoint/ \
    --batch_size 1 \
    --num_workers 0 \
    --code_dir . \
    --dataset_file erase \
    --eval \
    --resume your/checkpoint/file

Acknowledge

The repository is benefit a lot from DETR, Lama, SPL, ResTormer, and CTSDG. Thanks a lot for their excellent work.

Citation

If you find our method or dataset useful for your reserach, please cite:

@article{CTRNetpp,
        author = {Liu, Chongyu and Peng, Dezhi and Liu, Yuliang and Jin, Lianwen},
        title = {CTRNet++: Dual-path Learning with Local-global Context Modeling for Scene Text Removal},
        year = {2024},
        publisher = {Association for Computing Machinery},
        address = {New York, NY, USA},
        issn = {1551-6857},
        url = {https://doi.org/10.1145/3697837},
        doi = {10.1145/3697837},
        note = {Just Accepted},
        journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
        month = oct,
        keywords = {Scene Text Removal, Context Guidance, Dual-path Learning}
}

Feedback

Suggestions and opinions of our work (both positive and negative) are welcome. Please contact the authors by sending email to Chongyu Liu(liuchongyu1996@gmail.com). For commercial usage, please contact Prof. Lianwen Jin via (eelwjin@scut.edu.cn).