This repository contains split model for table structure extraction. The model predicts row/column seperators against an input image. It has five executable scripts:
- prepare_data.py
- train.py
- infer.py
- merge.py
- eval.py
A model has been provided with this repository placed at model_out/split_model.pth
. The model has been trained on an augmented data-set created from the originally provided labelled dataset.
prepare_data.py
takes as input the original labelled dataset (images, XML files and OCR files) and prepares the data for usage by the split model. Specifically it creates crops of tables out of the original dataset and generates corresponding split model labels and OCR.
Note: If provided OCR directory does not contain the corresponding OCR files, the program will generate OCR data and write it to the folder.
** XMLS are data annotations in PascalVoc format. **
usage: prepare_data.py [-h] -img IMAGE_DIR -xml XML_DIR -ocr OCR_DIR -o OUT_DIR
optional arguments:
-h, --help show this help message and exit
-img IMAGE_DIR, --image_dir IMAGE_DIR
Directory containing images
-xml XML_DIR, --xml_dir XML_DIR
Directory containing ground truth xmls in PasvalVoc Format
-ocr OCR_DIR, --ocr_dir OCR_DIR
Directory containing ocr files. (If an
OCR file is not found, it will be generated and saved
in this directory for future use)
-o OUT_DIR, --out_dir OUT_DIR
Path of output directory for generated data
Sample Command: python prepare_data.py -img data/images/ -xml data/xmls/ -ocr data/ocr/ -o data/prepared/
train.py
takes as input the data generated by the prepare_data.py
script and starts training of the split model.
The script has three required arguments namely images_dir, labels_dir and output_weight_path. Rest of the arguments are optional and have been set with default values.
usage: train.py [-h] -img TRAIN_IMAGES_DIR -l TRAIN_LABELS_DIR -o
OUTPUT_WEIGHT_PATH [-e NUM_EPOCHS] [-s SAVE_EVERY]
[--log_every LOG_EVERY] [--val_every VAL_EVERY]
[--lr LEARNING_RATE] [--dr DECAY_RATE] [--vs VALIDATION_SPLIT]
optional arguments:
-h, --help show this help message and exit
-img TRAIN_IMAGES_DIR, --images_dir TRAIN_IMAGES_DIR
Path to training table images (generated by
prepare_data.py).
-l TRAIN_LABELS_DIR, --labels_dir TRAIN_LABELS_DIR
Path to labels for split model (generated by
prepare_data.py).
-o OUTPUT_WEIGHT_PATH, --output_weight_path OUTPUT_WEIGHT_PATH
Output folder path for model checkpoints and summary.
-e NUM_EPOCHS, --num_epochs NUM_EPOCHS
Number of epochs.
-s SAVE_EVERY, --save_every SAVE_EVERY
Save checkpoints after given epochs
--log_every LOG_EVERY
Print logs after every given steps
--val_every VAL_EVERY
perform validation after given steps
--lr LEARNING_RATE, --learning_rate LEARNING_RATE
learning rate
--dr DECAY_RATE, --decay_rate DECAY_RATE
weight decay rate
--vs VALIDATION_SPLIT, --validation_split VALIDATION_SPLIT
validation split in data
Sample Command: python train.py -img data/prepared/table_images/ -l data/prepared/table_split_labels/ -o model_out/
infer.py
can be used to take a trained model and infer split results against a folder of table cropped images. Inside {OUTPUT_PATH} it generates two folders. One contains predictions in the form of XML files (same format as ground truth XMLs). The other contains visualization of the split results.
usage: infer.py [-h] -img TEST_IMAGES_DIR -m MODEL_WEIGHTS -o OUTPUT_PATH
optional arguments:
-h, --help show this help message and exit
-img TEST_IMAGES_DIR, --test_images_dir TEST_IMAGES_DIR
Path to testing data table images (generated by
prepare_data.py).
-m MODEL_WEIGHTS, --model_weights MODEL_WEIGHTS
path to model weights.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
path to the output directory
Sample Command: python infer.py -i data/prepared/table_images/ -m model_out/split_model.pth -o out_infer
merge.py
can be used to apply merge heuristics on XML files predicted by the split model through infer.py
. It can optionally be provided with table-level images if visualization of merges is required. If it is not provided, visualization will be skipped and only XMLs will be written in {OUTPUT_DIR}.
usage: merge.py [-h] -i INPUT_XML_DIR -o OUTPUT_DIR -ocr OCR_DIR [-img IMAGES_DIR]
optional arguments:
-h, --help show this help message and exit
-i INPUT_XML_DIR, --input_xml_dir INPUT_XML_DIR
Path to folder containing XML files predicted by
infer.py
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path to folder for writing output XML files and
visualization (optional) of merge heuristics.
-ocr OCR_DIR, --ocr_dir OCR_DIR
Path to folder containing table-level OCR files
generated by prepare_data.py
-img IMAGES_DIR, --images_dir IMAGES_DIR
Path to table-level images generated by
prepare_data.py (Optional. If not provided merge
visualization will not be written).
Sample Command: python merge.py -i out_infer/predicted_xmls/ -o merge_output/ -ocr data/prepared/table_ocr -img data/prepared/table_images/
Once results have been generated by infer.py
or merge.py
in XML format, they can be evaluated using eval.py
script. It has four inputs. First three are, the original document level images, ground-truth XMLs and OCR files. Note, that these are not the ones generated by the prepare_data.py
script but the original data. Fourth input is path to prediction XMLs generated by infer.py
script. Fifth is the output directory where evaluation results are to be written.
usage: eval.py [-h] -i IMAGES_DIR -xml XML_DIR -o OCR_DIR -p PRED_DIR -e
EVAL_OUT
optional arguments:
-h, --help show this help message and exit
-i IMAGES_DIR, --images_dir IMAGES_DIR
path to directory containing document-level images.
-xml XML_DIR, --xml_dir XML_DIR
path to directory containing document-level ground-
truth XML files.
-o OCR_DIR, --ocr_dir OCR_DIR
path to directory containing document-level ocr.
-p PRED_DIR, --pred_dir PRED_DIR
path to directory containing table-level prediction
XML files.
-e EVAL_OUT, --eval_out EVAL_OUT
path of directory in which to write the evaluation
results.
Sample Command: python eval.py -i data/images/ -xml data/xmls/ -o data/ocr -p out_infer/predicted_xmls/ -e evaluation/
If this work is useful for your research or if you use this implementation in your academic projects, please cite the following papers:
@InProceedings{ICDAR2019,
author = {Christopher Tensmeyer, Vlad Morariu, Brian Price, Scott Cohen and Tony Martinez},
title = {Deep Splitting and Merging for Table Structure Decomposition},
booktitle = {The 15th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
month = {September},
year = {2019}
}