/DIVA-DAF

Repository for the deep-learning framework DIVA-DAF which is build with historical document image analysis in mind.

Primary LanguagePythonMIT LicenseMIT

DIVA-DAF

PyTorch Lightning Config: Hydra Template
Coverage tests Maintainability Rating

Description

A deep learning framework for historical document image analysis.

How to run

Install dependencies

# clone project
git clone https://github.com/DIVA-DIA/unsupervised_learning.git
cd unsupervised_learing

# create conda environment (IMPORTANT: needs Python 3.8+)
conda env create -f conda_env_gpu.yaml

# activate the environment using .autoenv
source .autoenv

# install requirements
pip install -r requirements.txt

Train model with default configuration. Care: you need to change the value of data_dir in config/datamodule/cb55_10_cropped_datamodule.yaml.

# default run based on config/config.yaml
python run.py

# train on CPU
python run.py trainer.gpus=0

# train on GPU
python run.py trainer.gpus=1

Train using GPU

# [default] train on all available GPUs
python run.py trainer.gpus=-1

# train on one GPU
python run.py trainer.gpus=1

# train on two GPUs
python run.py trainer.gpus=2

# train on CPU
python run.py trainer.accelerator=ddp_cpu

Train using CPU for debugging

# train on CPU
python run.py trainer.accelerator=ddp_cpu trainer.precision=32

Train model with chosen experiment configuration from configs/experiment/

python run.py experiment=experiment_name

You can override any parameter from command line like this

python run.py trainer.max_epochs=20 datamodule.batch_size=64

Setup PyCharm

  1. Fork this repo
  2. Clone the repo to your local filesystem (git clone CLONELINK)
  3. Clone the repo onto your remote machine
  4. Move into the folder on your remote machine and create the conda environment (conda env create -f conda_env_gpu.yaml)
  5. Run source .autoenv in the root folder on your remote machine (activates the environment)
  6. Open the folder in PyCharm (File -> open)
  7. Add the interpreter (Preferences -> Project -> Python interpreter -> top left gear icon -> add... -> SSH Interpreter) follow the instructions (set the correct mapping to enable deployment)
  8. Upload the files (deployment)
  9. Create a wandb account (wandb.ai)
  10. Log via ssh onto your remote machine
  11. Go to the root folder of the framework and activate the environment (source .autoenv OR conda activate unsupervised_learning)
  12. Log into wandb. Execute wandb login and follow the instructions
  13. Now you should be able to run the basic experiment from PyCharm

Loading models

You can load the different model parts backbone or header as well as the whole task. To load the backbone or the header you need to add to your experiment config the field path_to_weights. e.g.

model:
    header:
        path_to_weights: /my/path/to/the/pth/file

To load the whole task you need to provide the path to the whole task to the trainer. This is with the field resume_from_checkpoint. e.g.

trainer:
    resume_from_checkpoint: /path/to/.ckpt/file

Freezing model parts

You can freeze both parts of the model (backbone or header) with the freeze flag in the config. E.g. you want to freeze the backbone: In the command line:

python run.py +model.backbone.freeze=True

In the config (e.g. model/backbone/baby_unet.yaml):

...
freeze: True
...

CARE: You can not train a model when you do not have trainable parameters (e.g. freezing backbone and header).

Selection in datasets

If you use the selection key you can either use an int, which takes the first n files, or a list of strings to filter the different datasets. In the case you are using a full-page dataset be aware that the selection list is a list of file names without the extension.

Cite us

@inproceedings{vogtlin2023DIVADAFDeepLearning,
  author       = {Lars V{\"{o}}gtlin and
                  Anna Scius{-}Bertrand and
                  Paul Maergner and
                  Andreas Fischer and
                  Rolf Ingold},
  title        = {{DIVA-DAF:} {A} Deep Learning Framework for Historical Document Image
                  Analysis},
  booktitle    = {Proceedings of the 7th International Workshop on Historical Document
                  Imaging and Processing, HIP@ICDAR 2023, San Jose, CA, USA, August
                  25-26, 2023},
  pages        = {61--66},
  publisher    = {{ACM}},
  year         = {2023},
  url          = {https://doi.org/10.1145/3604951.3605511},
  doi          = {10.1145/3604951.3605511}
}