
Inference, training and evaluation code for our models from the paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping" (ICDAR) 2023.

Primary LanguagePythonMIT LicenseMIT

🚀 Good news! We have have created a demo showcasing the capabilities of the model GeoTrTemplateLarge whithin the full document refinement pipeline. Check it out here!

Inv3D - Models

This repository contains the models and inference, training and evaluation code of our paper which has been accepted at the International Conference on Document Analysis and Recognition (ICDAR) 2023.

For more details see our project page project page.


VS-Code Devcontainer

We highly recommend to use the provided Devcontainer to make the usage as easy as possible:

  • Install Docker and VS Code
  • Install VS Code Devcontainer extension ms-vscode-remote.remote-containers
  • Clone the repository
    git clone https://github.com/FelixHertlein/inv3d-model.git
  • Press F1 (or CTRL + SHIFT + P) and select Dev Containers: Rebuild and Reopen Container
  • Go to Run and Debug (CTRL + SHIFT + D) and press the run button, alternatively press F5


Start the inference

python3 inference.py --model geotr_template@inv3d --dataset inv3d_real --gpu 0


The models will be downloaded automatically before the inference starts.

Available models are:

  • geotr@doc3d
  • geotr@inv3d
  • geotr_template@inv3d
  • geotr_template_large@inv3d



Inv3DReal is part of this repository and can be used by passing inv3d_real as the dataset argument.

Custom dataset

To unwarp your own data, you can mout your data inside the container using the .devcontainer/devcontainer.json config.

Mount your data folder to /workspaces/inv3d-model/input/YOUR_DATA_DIRECTORY. Make sure, all images start with the prefix image_ and the corresponding templates (only for template-based models) with the prefix template_.

Output: Unwarped images

All unwarped images are placed in the output folder.


Training datasets


Download Inv3D here, combine all downloads and mount it using the devcontainer.json, such that the file tree looks as follows:

|-- data
|   |-- test
|   |-- train
|   |-- val
|   `-- wc_min_max.json
|-- log.txt
`-- settings.json


Download Doc3D here, combine all downloads and mount it using the devcontainer.json, such that the file tree looks as follows:

|-- alb
|-- augtexnames.txt
|-- bm
|-- dmap
|-- img
|-- norm
|-- real
|-- recon
|-- test.txt
|-- train.txt
|-- uv
|-- val.txt
`-- wc

Start a new training

python3 train.py --model geotr_template --dataset inv3d --version v1 --gpu 0 --num_workers 32

Resume a training

python3 train.py --model geotr_template --dataset inv3d --version v1 --gpu 0 --num_workers 32 --resume

Training output

|-- checkpoints
|   |-- checkpoint-epoch=00-val_mse_loss=0.0015.ckpt
|   `-- last.ckpt
|-- logs
|   |-- events.out.tfevents.1698250741.d6258ba74799.433.0
|   |-- ...
|   `-- hparams.yaml
`-- model.py


train.py [-h]
--model MODEL
--dataset DATASET
[--version VERSION]
--num_workers NUM_WORKERS
[--model_kwargs MODEL_KWARGS]

Training script

  -h, --help            show this help message and exit
  --model {dewarpnet_bm,dewarpnet_joint,dewarpnet_wc,geotr,geotr_template,geotr_template_large,identity}
                        Select the model for training.
  --dataset {doc3d,doc3d_real,doc3d_test,empty,inv3d,inv3d_real,inv3d_real_tplrandom,inv3d_real_tplstruct,inv3d_real_tpltext,inv3d_real_tplwhite,inv3d_test,inv3d_test_tplrandom,inv3d_test_tplstruct,inv3d_test_tpltext,inv3d_test_tplwhite,inv3d_tplstruct,inv3d_tpltext,inv3d_tplwhite}
                        Select the dataset to train on.
  --version VERSION     Specify a version id for given training. Optional.
  --gpu GPU
                        The index of the GPU to use as an integer.
  --num_workers NUM_WORKERS
                        The number of workers as an integer.
  --fast_dev_run        Enable fast development run (default is False).
  --model_kwargs MODEL_KWARGS
                        Optional model keyword arguments as a JSON string.
  --resume              Resume from a previous run (default is False).


The evaluation contains MS-SSIM, LPIPS, CER and ED. LD is not in this repo since it requires the proprietary software Matlab.

Start an evaluation

python3 eval.py --trained_model geotr_template@inv3d@v1 --dataset inv3d_real --gpu 0 --num_workers 16

Evaluation output:

|-- eval
|   `-- DATASET
|       |-- examples
|       |   |-- 0
|       |   |   |-- norm_image.png
|       |   |   |-- orig_image.png
|       |   |   |-- out_bm.npz
|       |   |   `-- true_image.png
|       |   |-- ...
|       `-- results.csv
|-- logs


eval.py [-h]
--trained_model MODEL
--dataset DATASET
--gpu GPU
--num_workers NUM_WORKERS

Evaluation script

  -h, --help            show this help message and exit
  --trained_model {the model in the models directory}
                        Select the model for evaluation.
  --dataset {doc3d,doc3d_real,doc3d_test,empty,inv3d,inv3d_real,inv3d_real_tplrandom,inv3d_real_tplstruct,inv3d_real_tpltext,inv3d_real_tplwhite,inv3d_test,inv3d_test_tplrandom,inv3d_test_tplstruct,inv3d_test_tpltext,inv3d_test_tplwhite,inv3d_tplstruct,inv3d_tpltext,inv3d_tplwhite}
                        Select the dataset to evaluate on.
  --gpu GPU             The index of the GPU to use for training.
  --num_workers NUM_WORKERS
                        The number of workers as an integer.


If you use the code of our paper for scientific research, please consider citing

	title        = {Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping},
	author       = {Hertlein, Felix and Naumann, Alexander and Philipp, Patrick},
	year         = 2023,
	journal      = {International Journal on Document Analysis and Recognition (IJDAR)},
	publisher    = {Springer},
	pages        = {1--12}


The model GeoTr is part of DocTr. GeoTrTemplate is based on GeoTr.


FZI Logo


This project is licensed under MIT.