/garnet-with-craft

Primary LanguagePythonApache License 2.0Apache-2.0

GaRNet with CRAFT

GaRNet의 입력으로는 이미지와 텍스트의 위치를 탐지한 box들이 필요하다. 그러나 기존 GaRNet repo에는 box를 탐지하는 로직이 없어서 Naver의 CRAFT 모델을 이용하여 추출하는 로직을 추가하였다.

Getting started

CRAFT 모델 다운로드

https://github.com/clovaai/CRAFT-pytorch

  1. 위 repository에 존재하는 모델 다운로드 받는 링크를 통해 모델을 다운로드 받는다.
  2. CODE/craft/weights/에 모델 파일을 위치시킨다. ex) CODE/craft/weights/craft_mlt_25k.pth

GaRNet 모델 다운로드

https://github.com/naver/garnet

  1. 기존 repository 참고해 모델을 다운로드 받는다.
  2. WEIGHTS/GaRNet/에 모델 파일을 위치시킨다. ex) WEIGHTS/GaRNet/saved_model.pth

RUN

cd CODE/
python inference.py

결과 확인

CODE/result/ 경로 확인해보면 box와 함께 저장된 이미지와 text inpainting이 완료된 이미지를 확인할 수 있다.

grid_img_302.jpg img_302.jpg

추가 이미지

DATA/EXAMPLE/IMG/ 경로에 jpg 확장자로 이미지를 추가하면 다른 이미지도 테스트가 가능하다.

ex) DATA/EXAMPLE/IMG/my_img.jpg


The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis

The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis | Paper and Supplementary
Hyeonsu Lee, Chankyu Choi
Naver Corp.
In ECCV 2022.

This repository is official Pytorch implementation of GaRNet

Sample results

The sample image was taken from SCUT-EnsText.

Getting started

Requirements

  • Python 3.6.9
  • pytorch : 1.2.0
  • numpy 1.16.5
  • opencv 4.5.1.48
  • torchvision==0.4.0
  • ptflops==0.6.4 (for calculate inference time (GPU Time), GMac)
  • scikit-image==0.17.2 (for CRAFT)

The requirements can be installed by running

  pip install -r requirements.txt

Dataset

To compare the performance of previously proposed methods with the same evaluation dataset. we combined Oxford-synthetic data and SCUT-EnsText data. We provide subset information of combined dataset.

Pre-trained models

We provide the weight of EnsNet, MTRNet, MTRNet++, EraseNet and Our's trained on our combined dataset.

  • You can find pre-trained models in below.
Method pre-trained model
EnsNet saved_mode.pth
MTRNet saved_mode.pth
MTRNet++ saved_mode.pth
EraseNet saved_mode.pth
GaRNet link Generic badge

Note that all models are pre-trained with Synthetic datasets and SCUT-EnsText, and the SCUT-EnsText dataset can only be used for non-commercial research purposes.

Instructions for Comparison between Related works (Table 2 and 3 in the paper)

ImageEval

  • All

          bash eval_ImageEval_All.sh [REAL or SYN]
    
  • Individual

          cd ./RELATED
          python ./eval.py --model_type=[model type (ex. EnsNet, GaRNet...)] default: EnsNet \\
                --model_path=[pretrained model path] default: ../WEIGHTS/EnsNet/saved_model.pth \\
                --test_path=[the path of Test json file] default: ../DATA/JSON/REAL/test.json \\
                --input_size=[size of input image] default: 512 \\
                --batch=[batch size] default: 10 \\
                --gpu (use gpu for evaluation) default: False
    

For MTRNet++ and EraseNet, you need to get official model (or network) codes of them.

DetectionEval (GPU is required to run CRAFT)

  bash eval_DetectionEval.sh [REAL or SYN] [Model type (ex. EnsNet, GaRNet ...)]

For Detection Eval, you need to get CRAFT and DetEVAL script.

Instructions for proposed method

For Eval

  cd ./CODE
  python ./eval.py --model_path=[pretrained model path] default: ../WEIGHTS/GaRNet/saved_model.pth \\
        --test_path=[the path of Test json file] default: ../DATA/JSON/REAL/test.json \\
        --batch=[batch size] default: 10 \\
        --gpu (use gpu for evaluation) defulat: False \\

For Inference

  cd ./CODE
  python ./inference.py --result_path=[the path for save output image] default: ./result \\
        --image_path=[the path of input image] default: ../DATA/IMG \\
        --box_path=[the path of text files which have box information] default: ../DATA/TXT \\
        --input_size=[inference size] default: 512 \\
        --model_path=[the path of trained model] default: ../WEIGHTS/GaRNet/saved_model.pth \\
        --attention_vis (visualize attention) default: False \\
        --gpu (use gpu for inference) default: False \\

Citation

Please cite our paper if this work is useful for your research.

@inproceedings{lee2022surprisingly,
  title={The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis},
  author={Lee, Hyeonsu and Choi, Chankyu},
  booktitle={European Conference on Computer Vision},
  pages={457--472},
  year={2022},
  organization={Springer}
}

License

GaRNet is licensed under Apache-2.0. See LICENSE for the full license text.

Copyright 2022-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Acknowledgments

Our model codes starts from EnsNet.
Our research is benefit a lot from MTRNet++ and EraseNet, we wish to thank their work for providing model codes and real-dataset.
We acknowledge the official code and pre-trained weight CRAFT
We use ssim calculation code in piq.