Cell-DETR: Attention-Based Transformers for Instance Segmentation of Cells in Microstructures
Tim Prangemeier, Christoph Reich & Heinz Koeppl
This repository includes the official and maintained implementation of the paper Attention-Based Transformers for Instance Segmentation of Cells in Microstructures (BIBM 2020).
Abstract
Detecting and segmenting object instances is a common task in biomedical applications. Examples range from detecting lesions on functional magnetic resonance images, to the detection of tumours in histopathological images and extracting quantitative single-cell information from microscopy imagery, where cell segmentation is a major bottleneck. Attention-based transformers are state-of-the-art in a range of deep learning fields. They have recently been proposed for segmentation tasks where they are beginning to outperforming other methods. We present a novel attention-based cell detection transformer (Cell-DETR) for direct end-to-end instance segmentation. While the segmentation performance is on par with a state-of-the-art instance segmentation method, Cell-DETR is simpler and faster. We showcase the method's contribution in a the typical use case of segmenting yeast in microstructured environments, commonly employed in systems or synthetic biology. For the specific use case, the proposed method surpasses the state-of-the-art tools for semantic segmentation and additionally predicts the individual object instances. The fast and accurate instance segmentation performance increases the experimental information yield for a posteriori data processing and makes online monitoring of experiments and closed-loop optimal experimental design feasible.
Architecture
Architecture of the end-to-end instance segmentation network, with brightfield specimen image input and an instance segmentation prediction as output.
The backbone CNN encoder extracts image features that then feed into both the transformer encoder-decoder for class and bounding box prediction, as well
as to the CNN decoder for segmentation. The transformer encoded features, as well as the transformer decoded features, are feed into a multi-head-attention
module and together with the image features from the CNN backbone feed into the CNN decoder for segmentation. Skip connections additionally bridge
between the backbone CNN encoder and the CNN decoder. Input and output resolution is 128 × 128 pixels.
Dependencies
The Cell-DETR implementation uses multiple existing implementations. First, deformable convolutions v2 [2] are used based on the implementation of Dazhi Cheng. Second, pade activation units [4] are utilized base on the official implementation, by the authors. And third, the pixel adaptive convolutions [3] implementation by Nvidia is used. The pade activation unit implementation as well as the pixel adaptive convolution implementation are adopted slightly and are included in this repository. All dependencies can be installed by exicuting the following commands
git submodule add https://github.com/LSnyd/Cell-DETR.git
cd Cell-DETR
pip install -r requirements.txt
git submodule add https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch
cd Deformable-Convolution-V2-PyTorch
git checkout pytorch_1.0.0
python setup.py build install
cd ../pade_activation_unit/cuda
Before building and installing the pade_activation_unit, it may be necessary to cancel out the extra_compile_args in the setup.py (line 312-314) file in the cuda folder:
# extra_compile_args={'cxx': [],
# 'nvcc': ['-gencode=arch=compute_60,code="sm_60,compute_60"', '-lineinfo',
# "-ccbin=gcc-6.3.0"]}
Afterwards run:
python setup.py build install
The transformer [5] implementation is based on the official implementation of DETR [1].
Usage
CELL-DETR can be trained, validated and testes by using the main.py
script. The following command line arguments
define what actions are performed.
python main.py {+ args}
Argument | Default value | Info |
---|---|---|
--train |
False | Binary flag. If set training will be performed. |
--val |
False | Binary flag. If set validation will be performed. |
--test |
False | Binary flag. If set testing will be performed. |
--cuda_devices |
"0" | String of cuda device indexes to be used. Indexes must be separated by a comma |
--data_parallel |
False | Binary flag. If multi GPU training should be utilized set flag. |
--cpu |
False | Binary flag. If set all operations are performed on the CPU. |
--epochs |
200 | Number of epochs to perform while training. |
--lr_schedule |
False | Binary flag. If set the learning rate will be reduced after epoch 50 and 100. |
--ohem |
False | Binary flag. If set online heard example mining is utilized. |
--ohem_fraction |
0.75 | Ohem fraction to be applied when performing ohem. |
--batch_size |
4 | Batch size to be utilized while training. |
--path_to_data |
"trapped_yeast_cell_dataset" | Path to dataset. |
--augmentation_p |
0.6 | Probability that data augmentation is applied on training data sample. |
--lr_main |
1e-04 | Learning rate of the detr model (excluding backbone). |
--lr_backbone |
1e-05 | Learning rate of the backbone network. |
--no_pac |
False | Binary flag. If set no pixel adaptive convolutions will be utilized in the segmentation head. |
--load_model |
"" | Path to model to be loaded. |
--dropout |
0.0 | Dropout factor to be used in model. |
--three_classes |
False | Binary flag, If set three classes (trap, cell of interest and add. cells) will be utilized. |
--softmax |
False | Binary flag, If set a softmax will be applied to the segmentation prediction instead sigmoid. |
--only_train_segmentation_head_after_epoch |
200 | Number of epoch where only the segmentation head is trained. |
--lr_segmentation_head |
1e-05 | Learning rate of the segmentation head, only applied when seg head is trained exclusively. |
--no_deform_conv |
False | Binary flag. If set no deformable convolutions will be utilized. |
--no_pau |
False | Binary flag. If set no pade activation unit is utilized, however, a leaky ReLU is utilized. |
For training, validating and testing of the Cell-DETR B architecture run
python main.py --train --val --test --path_to_data "trapped_yeast_cell_dataset" --lr_schedule --batch_size 10 --data_parallel --cuda_devices "0, 1" --softmax
For training, validating and testing of the Cell-DETR A architecture run
python main.py --train --val --test --path_to_data "trapped_yeast_cell_dataset" --lr_schedule --batch_size 10 --data_parallel --cuda_devices "0, 1" --softmax --no_pac --no_deform_conv --no_pau
Trained Models
Our trained models (Cell-DETR A & B) are included in the folder trained_models
.
To load and test the trained Cell-DETR A model run
python main.py --test --path_to_data "trapped_yeast_cell_dataset" --cuda_devices "0" --softmax --no_pac --no_deform_conv --no_pau --load_model "trained_models/Cell_DETR_A"
to load and test the trained Cell-DETR B model run
python main.py --test --path_to_data "trapped_yeast_cell_dataset" --cuda_devices "0" --softmax --load_model "trained_models/Cell_DETR_B"
Data
A few toy/test examples of the trapped yeast cell instance segmentation dataset are included in folder trapped_yeast_cell_dataset
.
The full dataset can be requested from the author's.
Results
Qualitative results
Example segmentations of our Cell-DETR B model.
Segmentation results
Model | Dice | Accuracy | mIoU (mean over instances) | Cell IoU |
---|---|---|---|---|
Cell-DETR A | 0.92 | 0.96 | 0.84 | 0.83 |
Cell-DETR B | 0.92 | 0.96 | 0.85 | 0.84 |
Bounding box results
Model | MSE | L1 | IoU | gIoU |
---|---|---|---|---|
Cell-DETR A | 0.0006 | 0.016 | 0.81 | 0.80 |
Cell-DETR B | 0.0005 | 0.016 | 0.81 | 0.81 |
Classification results
Model | Accuracy |
---|---|
Cell-DETR A | 1.0 |
Cell-DETR B | 1.0 |
Citation
If you find this research useful in your work, please acknowledge it appropriately and cite the paper:
@article{prangemeier2020c,
title={Attention-Based Transformers for Instance Segmentation of Cells in Microstructures},
author={Prangemeier, Tim and Reich, Christoph and Koeppl, Heinz},
booktitle={2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
year={2020}
}
References
[1] @article{carion2020end,
title={End-to-End Object Detection with Transformers},
author={Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
journal={arXiv preprint arXiv:2005.12872},
year={2020}
}
[2] @inproceedings{zhu2019deformable,
title={Deformable convnets v2: More deformable, better results},
author={Zhu, Xizhou and Hu, Han and Lin, Stephen and Dai, Jifeng},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={9308--9316},
year={2019}
}
[3] @inproceedings{su2019pixel,
title={Pixel-adaptive convolutional neural networks},
author={Su, Hang and Jampani, Varun and Sun, Deqing and Gallo, Orazio and Learned-Miller, Erik and Kautz, Jan},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={11166--11175},
year={2019}
}
[4] @article{molina2019pad,
title={Pad$\backslash$'e Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks},
author={Molina, Alejandro and Schramowski, Patrick and Kersting, Kristian},
journal={arXiv preprint arXiv:1907.06732},
year={2019}
}
[5] @inproceedings{vaswani2017attention,
title={Attention is all you need},
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
booktitle={Advances in neural information processing systems},
pages={5998--6008},
year={2017}
}