The code of this repository was used for the following publication. If you find this code useful please cite our paper:
@article{Gallego2019,
title = "A selectional auto-encoder approach for document image binarization",
author = "Jorge Calvo-Zaragoza and Antonio-Javier Gallego",
journal = "Pattern Recognition",
volume = "86",
pages = "37 - 47",
year = "2019",
issn = "0031-3203",
doi = "https://doi.org/10.1016/j.patcog.2018.08.011"
}
Below we include instructions to binarize and reproduce the experiments.
The binarize.py
script performs the binarization of an input image using a trained model. The parameters of this script are the following:
Parameter | Default | Description |
---|---|---|
-imgpath |
Path to the image to process | |
-modelpath |
(*) | Path to the model to load |
-w |
256 | Input window size |
-s |
-1 | Step size. -1 to use window size |
-f |
64 | Number of filters |
-k |
5 | Kernel size |
-drop |
0 | Dropout percentage |
-stride |
2 | Convolution stride size |
-every |
1 | Residual connections every x layers |
-th |
0.5 | Selectional threshold |
-save |
Output image filename | |
--demo |
Activate demo mode |
(*) By default, the model trained with all datasets will be used.
The only mandatory parameter is -imgpath
, the rest are optional. You also have to choose if you want to see a demo (--demo
) or to save (-save
) the binarized image.
For example, to binarize the image img01.png
you can run the following command:
$ python binarize.py -imgpath img01.png -save out.png
If you want to binarize this image using the model trained for Dibco 2016 and the parameters specified in the paper, you have to run the following command:
$ python binarize.py -imgpath img01.png -modelpath MODELS/model_weights_dibco_6_256x256_s96_aug_m205_f64_k5_s2_se3_e200_b32_esp.h5 -w 256 -s 96 -f 64 -k 5 -stride 2 -th 0.5 --demo
The MODELS
folder includes the following trained models for the datasets evaluated:
MODELS/model_weights_all_None_256x256_s96_aug_m205_f64_k5_s2_se3_e200_b32_esp.h5
MODELS/model_weights_dibco_5_256x256_s96_aug_m205_f64_k5_s2_se2_e200_b32_esp.h5
MODELS/model_weights_dibco_6_256x256_s96_aug_m205_f64_k5_s2_se3_e200_b32_esp.h5
MODELS/model_weights_ein_-1_256x256_s96_aug_m205_f64_k5_s2_se2_e200_b32_esp.h5
MODELS/model_weights_palm_0_256x256_s96_aug_m205_f64_k5_s2_se2_e200_b32_esp.h5
MODELS/model_weights_palm_1_256x256_s96_aug_m205_f64_k5_s2_se2_e200_b32_esp.h5
MODELS/model_weights_phi_-1_256x256_s96_aug_m205_f64_k5_s2_se2_e200_b32_esp.h5
MODELS/model_weights_sal_-1_256x256_s96_aug_m205_f64_k5_s2_se2_e200_b32_esp.h5
The train.py
script performs the training of the proposed network from a dataset of images and a series of input parameters that allow to configure both the training process and the network topology. This executable, in addition to training, performs the validation and saves the resulting images.
The parameters of this script are the following:
Parameter | Default | Description |
---|---|---|
-path |
Base path to datasets | |
-db |
Database name ['dibco','palm','phi','ein','sal','voy','bdi','all'] | |
-dbp |
Database dependent parameters [dibco fold, palm gt] | |
--aug |
Load augmentation folders | |
-w |
256 | Input window size |
-s |
-1 | Step size. -1 to use window size |
-f |
64 | Number of filters |
-k |
5 | Kernel size |
-drop |
0 | Dropout percentage |
-stride |
2 | Convolution stride size |
-every |
1 | Residual connections every x layers |
-th |
0.5 | Selectional threshold. -1 to evaluate from 0 to 1 |
-page |
-1 | Page size to divide the training set. -1 to load all |
-start_from |
0 | Start from this page |
-super |
1 | Number of super epochs |
-e |
200 | Number of epochs |
-b |
10 | Mini-batch size |
-esmode |
p | Early stopping mode: g='global', p='per page' |
-espat |
10 | Early stopping patience |
-verbose |
1 | 1=show batch increment, other=mute |
--test |
Only run test (deactivate training) | |
-loadmodel |
Weights filename to load for test or for initialization |
The only mandatory parameters are -path
and -db
. To run the training you must first indicate the path where the datasets are located (with the -path
parameter) and the name of the dataset to evaluate (with the -db
parameter). Depending on the name of the dataset the system will create the partitions for training and validation.
The folders of each dataset must have a specific name (see table in section "Datasets" or consult the source code). Within each folder there must be two subfolders, one with the suffix \_GR
with the input images in grayscale, and another with the suffix \_GT
for the ground truth.
Option --aug
activates the data augmentation, in this case the system will load the images stored in the folder with prefix aug_
.
For example, to train a network for the dataset Dibco 2016, using data augmentation and the parameters specified in the paper, you have to run the following command:
$ python -u train.py -path datasets -db dibco -dbp 6 --aug -w 256 -s 128 -f 64 -k 5 -e 200 -b 10 -th -1 -stride 2 -page 64
Once the script finishes, it will save the learned weights and the binarized images resulting from the validation partition in a folder with the prefix _PR-
followed by the name of the model. It will also show the mean squared reconstruction error and the obtained F-m. We have also used the evaluation tool provided by DIBCO (http://vc.ee.duth.gr/h-dibco2016/benchmark/), which allowed us to validate the obtained F-m and also calculate the pseudo F-m.
Below is a summary table of the datasets used in this work along with a link from which they can be downloaded:
Dataset | Acronym | URL |
---|---|---|
DIBCO 2009 | DB09 | http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark/ |
DIBCO 2010 | DB10 | http://users.iit.demokritos.gr/~bgat/H-DIBCO2010/benchmark/ |
DIBCO 2011 | DB11 | http://utopia.duth.gr/~ipratika/DIBCO2011/benchmark/ |
DIBCO 2012 | DB12 | http://utopia.duth.gr/~ipratika/HDIBCO2012/benchmark/ |
DIBCO 2013 | DB13 | http://utopia.duth.gr/~ipratika/DIBCO2013/benchmark/ |
DIBCO 2014 | DB14 | http://users.iit.demokritos.gr/~bgat/HDIBCO2014/benchmark/ |
DIBCO 2016 | DB16 | http://vc.ee.duth.gr/h-dibco2016/benchmark/ |
Palm Leaf | PL-I / PL-II | http://amadi.univ-lr.fr/ICFHR2016_Contest/ |
Persian docs | PHI | http://www.iapr-tc11.org/mediawiki/index.php/Persian_Heritage_Image_Binarization_Dataset_(PHIBD_2012) |
Ensiedeln | ES | http://www.e-codices.unifr.ch/en/sbe/0611/ |
Salzinnes | SAM | https://cantus.simssa.ca/manuscript/133/ |
The ground truth of Ensiedeln and Salzinnes is available at the following link: