Starter code for using the hdBPMN dataset for diagram recognition research.
The dump_coco.py script can be used to convert the images and BPMN XMLs into a COCO dataset. COCO is a common format used in computer vision research to annotate the objects and keypoints in images.
python scripts/dump_coco.py path/to/hdBPMN path/to/target/coco/directory/hdbpmn
Moreover, the demo.ipynb Jupyter notebook can be used to visualize (1) the extracted bounding boxes, keypoints, and relations, and (2) the annotated BPMN diagram overlayed over the hand-drawn image. Note that the latter requires the bpmn-to-image tool, which in turn requires a nodejs installation.
pip install pybpmn-parser
In order to set up the necessary environment:
- create an environment
pybpmn-parser
with the help of conda:conda env create -f environment.yml
- activate the new environment with:
conda activate pybpmn-parser
NOTE: The conda environment will have pybpmn-parser installed in editable mode. Some changes, e.g. in
setup.cfg
, might require you to runpip install -e .
again.
Optional and needed only once after git clone
:
-
install JupyterLab kernel
python -m ipykernel install --user --name "${CONDA_DEFAULT_ENV}" --display-name "$(python -V) (${CONDA_DEFAULT_ENV})"
-
install several pre-commit git hooks with:
pre-commit install # You might also want to run `pre-commit autoupdate`
and checkout the configuration under
.pre-commit-config.yaml
. The-n, --no-verify
flag ofgit commit
can be used to deactivate pre-commit hooks temporarily.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── environment.yml <- The conda environment file for reproducibility.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build system configuration. Do not change!
├── scripts <- Analysis and production scripts which import the
│ actual Python package, e.g. train_model.py.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- Use `pip install -e .` to install for development or
│ or create a distribution with `tox -e build`.
├── src
│ └── pybpmn <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `py.test`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
This project has been set up using PyScaffold 4.0.1 and the dsproject extension 0.6.1.