Cluster Tools

Workflows for distributed Bio Image Analysis and Segmentation. Supports Slurm, LSF and local execution, easy to extend to more scheduling systems.

Workflows

Hierarchical Multicut / Hierarchical lifted Multicut
- Distance Transform Watersheds
- Region Adjacency Graph
- Edge Feature Extraction from Boundary-or-Affinity Maps
- Agglomeration via (lifted) Multicut
Sparse lifted Multicut from biological priors
Mutex Watershed
Connected Components
Downscaling and Pyramids
Ilastik Prediction
Skeletonization
Distributed Neural Network Prediction (originally implemented here)
Validation with Rand Index and Variation of Information

Installation

You can install the package via conda:

conda install -c conda-forge cluster_tools

To set-up a develoment environment with all necessary dependencies, you can use the environment.yml file:

conda env create -f environment.yml

and then install the package in development mode via

pip install -e . --no-deps

Citation

If you use this software in a publication, please cite

Pape, Constantin, et al. "Solving large multicut problems for connectomics via domain decomposition." Proceedings of the IEEE International Conference on Computer Vision. 2017.

For the lifted multicut workflows, please cite

Pape, Constantin, et al. "Leveraging Domain Knowledge to improve EM image segmentation with Lifted Multicuts." arXiv preprint. 2019.

You can find code for the experiments in publications/lifted_domain_knowledge.

If you are using another algorithom not part of these two publications, please also cite the appropriate publication (see the links here).

Getting Started

This repository uses luigi for workflow management. We support different cluster schedulers, so far

slurm
lsf
local (local execution based on ProcessPool)

The scheduler can be selected by the keyword target. Inter-process communication is achieved through files which are stored in a temporary folder and most workflows use n5 storage. You can use z5 to convert files to it with python.

Simplified, running a workflow from this repository looks like this:

import json
import luigi
from cluster_tools import SimpleWorkflow  # this is just a mock class, not actually part of this repository

# folder for temporary scripts and files
tmp_folder = 'tmp_wf'

# directory for configurations for workflow sub-tasks stored as json
config_dir = 'configs'

# get the default configurations for all sub-tasks
default_configs = SimpleWorkflow.get_config()

# global configuration for shebang to proper python interpreter with all dependencies,
# group name and block-shape
global_config = default_configs['global']
shebang = '#! /path/to/bin/python'
global_config.update({'shebang': shebang, 'groupname': 'mygroup'})
with open('configs/global.config', 'w') as f:
  json.dump(global_config, f)
  
# run the example workflow with `max_jobs` number of jobs
max_jobs = 100
task = SimpleWorkflow(tmp_folder=tmp_folder, config_dir=config_dir,
                      target='slurm', max_jobs=max_jobs,
                      input_path='/path/to/input.n5', input_key='data',
                      output_path='/path/to/output.n5', output_key='data')
luigi.build([task])

For a list of the available segmentation worklfows, have a look at this. Unfortunately, there is no proper documentation yet. For more details, have a look at the examples, in particular this example. You can donwload the example data (also used for the tests) here.

constantinpape/cluster_tools

Cluster Tools

Workflows

Installation

Citation

Getting Started