/splerge-tab-aug

Code for: U. Khan, S. Zahid, M.A. Ali, A. Ul-Hasan and F. Shafait, TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition (2021)

Primary LanguagePython

About

This repository contains split model for table structure extraction. The model predicts row/column seperators against an input image. It has five executable scripts:

- prepare_data.py
- train.py
- infer.py
- merge.py
- eval.py

A model has been provided with this repository placed at model_out/split_model.pth. The model has been trained on an augmented data-set created from the originally provided labelled dataset.

Usages

1. Prepare Data

prepare_data.py takes as input the original labelled dataset (images, XML files and OCR files) and prepares the data for usage by the split model. Specifically it creates crops of tables out of the original dataset and generates corresponding split model labels and OCR.

Note: If provided OCR directory does not contain the corresponding OCR files, the program will generate OCR data and write it to the folder. ** XMLS are data annotations in PascalVoc format. **

usage: prepare_data.py [-h] -img IMAGE_DIR -xml XML_DIR -ocr OCR_DIR -o OUT_DIR

optional arguments:
  -h, --help            show this help message and exit
  -img IMAGE_DIR, --image_dir IMAGE_DIR
                        Directory containing images
  -xml XML_DIR, --xml_dir XML_DIR
                        Directory containing ground truth xmls in PasvalVoc Format
  -ocr OCR_DIR, --ocr_dir OCR_DIR
                        Directory containing ocr files. (If an
                        OCR file is not found, it will be generated and saved
                        in this directory for future use)
  -o OUT_DIR, --out_dir OUT_DIR
                        Path of output directory for generated data

Sample Command: python prepare_data.py -img data/images/ -xml data/xmls/ -ocr data/ocr/ -o data/prepared/

2. Train Split Model

train.py takes as input the data generated by the prepare_data.py script and starts training of the split model. The script has three required arguments namely images_dir, labels_dir and output_weight_path. Rest of the arguments are optional and have been set with default values.

usage: train.py [-h] -img TRAIN_IMAGES_DIR -l TRAIN_LABELS_DIR -o
                OUTPUT_WEIGHT_PATH [-e NUM_EPOCHS] [-s SAVE_EVERY]
                [--log_every LOG_EVERY] [--val_every VAL_EVERY]
                [--lr LEARNING_RATE] [--dr DECAY_RATE] [--vs VALIDATION_SPLIT]

optional arguments:
  -h, --help            show this help message and exit
  -img TRAIN_IMAGES_DIR, --images_dir TRAIN_IMAGES_DIR
                        Path to training table images (generated by
                        prepare_data.py).
  -l TRAIN_LABELS_DIR, --labels_dir TRAIN_LABELS_DIR
                        Path to labels for split model (generated by
                        prepare_data.py).
  -o OUTPUT_WEIGHT_PATH, --output_weight_path OUTPUT_WEIGHT_PATH
                        Output folder path for model checkpoints and summary.
  -e NUM_EPOCHS, --num_epochs NUM_EPOCHS
                        Number of epochs.
  -s SAVE_EVERY, --save_every SAVE_EVERY
                        Save checkpoints after given epochs
  --log_every LOG_EVERY
                        Print logs after every given steps
  --val_every VAL_EVERY
                        perform validation after given steps
  --lr LEARNING_RATE, --learning_rate LEARNING_RATE
                        learning rate
  --dr DECAY_RATE, --decay_rate DECAY_RATE
                        weight decay rate
  --vs VALIDATION_SPLIT, --validation_split VALIDATION_SPLIT
                        validation split in data

Sample Command: python train.py -img data/prepared/table_images/ -l data/prepared/table_split_labels/ -o model_out/

3. Inference of Split Model

infer.py can be used to take a trained model and infer split results against a folder of table cropped images. Inside {OUTPUT_PATH} it generates two folders. One contains predictions in the form of XML files (same format as ground truth XMLs). The other contains visualization of the split results.

usage: infer.py [-h] -img TEST_IMAGES_DIR -m MODEL_WEIGHTS -o OUTPUT_PATH

optional arguments:
  -h, --help            show this help message and exit
  -img TEST_IMAGES_DIR, --test_images_dir TEST_IMAGES_DIR
                        Path to testing data table images (generated by
                        prepare_data.py).
  -m MODEL_WEIGHTS, --model_weights MODEL_WEIGHTS
                        path to model weights.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        path to the output directory

Sample Command: python infer.py -i data/prepared/table_images/ -m model_out/split_model.pth -o out_infer

4. Apply Merge Heuristics

merge.py can be used to apply merge heuristics on XML files predicted by the split model through infer.py. It can optionally be provided with table-level images if visualization of merges is required. If it is not provided, visualization will be skipped and only XMLs will be written in {OUTPUT_DIR}.

usage: merge.py [-h] -i INPUT_XML_DIR -o OUTPUT_DIR -ocr OCR_DIR [-img IMAGES_DIR]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_XML_DIR, --input_xml_dir INPUT_XML_DIR
                        Path to folder containing XML files predicted by
                        infer.py
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Path to folder for writing output XML files and
                        visualization (optional) of merge heuristics.
  -ocr OCR_DIR, --ocr_dir OCR_DIR
                        Path to folder containing table-level OCR files
                        generated by prepare_data.py
  -img IMAGES_DIR, --images_dir IMAGES_DIR
                        Path to table-level images generated by
                        prepare_data.py (Optional. If not provided merge
                        visualization will not be written).

Sample Command: python merge.py -i out_infer/predicted_xmls/ -o merge_output/ -ocr data/prepared/table_ocr -img data/prepared/table_images/

5. Evaluation

Once results have been generated by infer.py or merge.py in XML format, they can be evaluated using eval.py script. It has four inputs. First three are, the original document level images, ground-truth XMLs and OCR files. Note, that these are not the ones generated by the prepare_data.py script but the original data. Fourth input is path to prediction XMLs generated by infer.py script. Fifth is the output directory where evaluation results are to be written.

usage: eval.py [-h] -i IMAGES_DIR -xml XML_DIR -o OCR_DIR -p PRED_DIR -e
               EVAL_OUT

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGES_DIR, --images_dir IMAGES_DIR
                        path to directory containing document-level images.
  -xml XML_DIR, --xml_dir XML_DIR
                        path to directory containing document-level ground-
                        truth XML files.
  -o OCR_DIR, --ocr_dir OCR_DIR
                        path to directory containing document-level ocr.
  -p PRED_DIR, --pred_dir PRED_DIR
                        path to directory containing table-level prediction
                        XML files.
  -e EVAL_OUT, --eval_out EVAL_OUT
                        path of directory in which to write the evaluation
                        results.

Sample Command: python eval.py -i data/images/ -xml data/xmls/ -o data/ocr -p out_infer/predicted_xmls/ -e evaluation/

Citation

If this work is useful for your research or if you use this implementation in your academic projects, please cite the following papers:

@InProceedings{ICDAR2019,
author = {Christopher Tensmeyer, Vlad Morariu, Brian Price, Scott Cohen and Tony Martinez},
title = {Deep Splitting and Merging for Table Structure Decomposition},
booktitle = {The 15th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
month = {September},
year = {2019}
}