HRCenterNet (Official Pytorch Implementation)

Chinese Character Detection in Historical Documents

HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents
Chia-Wei Tang, Chao-Lin Liu, Po-Sen Chu
IEEE Big Data 2020 Workshops, Computational Archival Science: digital records in the age of big data
IEEE Xplore (10.1109/BigData50022.2020.9378051)
arXiv technical report (arXiv 2012.05739)

Contact: 106703054@g.nccu.edu.tw. Any questions or discussions are welcomed!

Installation

git clone https://github.com/Tverous/HRCenterNet.git
cd HRCenterNet/
pip install -r requirements.txt

Download the dataset and the pretrained weight

Google Drive

OneDrive

How to use ?

Training:

python train.py --train_csv_path data/train.csv --train_data_dir data/images \
                --val_csv_path data/val.csv --val_data_dir data/images/ --val \
                --batch_size 8 --epoch 80

Evaluation:

python evaluate.py --csv_path data/val.csv --data_dir data/images/ --log_dir weights/HRCenterNet.pth.tar

Test with unseen images:

python test.py --data_dir /path/to/images --log_dir /path/to/pretrained --output_dir /path/to/save/outputs

Training on Your Own Dataset

Prepare your csv files with following format:

image_id              labels
file_name_1           obj_id_1 topleft_x topleft_y width height obj_id_2 topleft_x topleft_y width height ...
file_name_2           obj_id_1 topleft_x topleft_y width height obj_id_2 topleft_x topleft_y width height ...
    .                 .
    .                 .
    .                 .

Results

Citation

Use this bibtex to cite this repository:

@INPROCEEDINGS{
  9378051,  
  author={C. -W. {Tang} and C. -L. {Liu} and P. -S. {Chiu}},  
  booktitle={2020 IEEE International Conference on Big Data (Big Data)},   
  title={HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents},   
  year={2020},  
  volume={},  
  number={},  
  pages={1924-1930},  
  doi={10.1109/BigData50022.2020.9378051}
}