/aradeepopsis

A versatile, fully open-source pipeline to extract phenotypic measurements from plant images

Primary LanguageNextflowGNU General Public License v3.0GPL-3.0

ARADEEPOPSIS

CI Latest Release License DOI Nextflow
run with conda run with docker run with singularity run with podman run with charliecloud

Introduction

ARADEEPOPSIS is a software tool that enables plant researchers to non-invasively score plant growth, biomass accumulation and senescence from image data in a highly parallelized, high throughput, yet easy to use manner.

It is built upon the published, convolutional neural network (CNN) DeepLabv3+[1] that serves the task of semantic image segmentation. A pretrained checkpoint of this model has been trained upon using manually annotated top-view images of Arabidopsis thaliana plants of different ages. The code that was used for training can be found here. The different models are available here DOI

The training and validation datasets including ground truth annotations are available at DOI

How it works

Models

The pipeline is implemented using open source software such as Nextflow[2], TensorFlow[3], ImageMagick, scikit-image[4] and shiny[5].

The pipeline uses either a conda environment or a Docker container to resolve dependencies, ensuring a high level of reproducibility and portability. It is largely platform independent and scales from Personal Computers to High Performance Computing (HPC) infrastructure, allowing for time efficient analysis of hundreds of thousands of images within a day.

Note: To ensure reproducibility, container-based profiles are recommended over conda.

Once the pipeline is fed with images of single plants, it converts the images into chunks of arbitrary size by saving the image data into an IO-optimized binary file format.

These file records are then, in parallel, served to a deep learning model, allowing for pixel-by-pixel classification of the image data.

Three distinct models are available and should be chosen according to the research interest.

Models

Depending on the model of choice, the pipeline extracts relevant phenotypic information such as:

  • plant area (model A/B/C)
  • area of senescent/necrotic tissue (model B/C)
  • area of anthocyanin-rich tissue (model C)
  • color composition and color indices as described by Del Valle et al. 2018[6] for each class supported by the respective model
  • a variety of morphometric traits for each class supported by the respective model

Traits

Extensibility

The pipeline has a modular character, also allowing the use of custom models obtained from the Deep Plant Phenomics[7] framework for the semantic segmentation step.

In addition, it is also possible to feed the pipeline with segmentation masks obtained by other methods, while still making use the automated trait extraction and visualization capabilities of the pipeline.

Usage

Hardware requirements

Running the pipeline with the provided models requires at least 6GB of memory for singlescale inference, and at least 12GB for --multiscale.

Setting up the pipeline

Note: Windows users will have to set up the WSL first.

  1. Install Nextflow

  2. Install either conda, Docker, podman, Charliecloud or Singularity.

Note: To run the pipeline on MacOS with Docker, it is necessary to increase the allowed memory usage from the default setting of 2GB to 6GB (See the Docker documentation for instructions)

Running the pipeline

To run the pipeline you have to provide single-pot plant images:

nextflow run Gregor-Mendel-Institute/aradeepopsis --images 'path/to/images/*{png|jpg}' -profile {conda|docker|podman|singularity|charliecloud}

Example to run on the CBE cluster using Singularity

module load nextflow/20.01.0

nextflow run Gregor-Mendel-Institute/aradeepopsis --images 'path/to/images/*{png|jpg}' -profile cbe,singularity

Default Parameters

Parameter Default value Type
--model C <Character>
--images None <Path>
--masks false <Path>
--multiscale false <Boolean>
--chunksize 10 <Integer>
--ignore_senescence true <Boolean>
--outdir ./results <Path>
--save_overlay true <Boolean>
--save_mask true <Boolean>
--save_hull true <Boolean>
--summary_diagnostics false <Boolean>
--shiny true <Boolean>

Pipeline Output

The pipeline computes a total of 78 morphometric and color-related traits from the analysed images. A description of all traits is available here.

References

[1] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.
Chen, L.-C. et al., 2018. arXiv [cs.CV]. Available at: http://arxiv.org/abs/1802.02611.

[2] Nextflow enables reproducible computational workflows.
Di Tommaso, P. et al., 2017. Nature biotechnology, 35(4), pp.316–319.

[3] TensorFlow: Large-scale machine learning on heterogeneous systems.
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia,Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, 2015

[4] scikit-image: Image processing in Python.
Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu and the scikit-image contributors. PeerJ 2:e453 (2014)

[5] shiny: Easy web applications in R
Rstudio Inc. (2014)

[6] Digital photography provides a fast, reliable, and noninvasive method to estimate anthocyanin pigment concentration in reproductive and vegetative plant tissues
Del Valle JC, Gallardo-López A, Buide ML, Whittall JB, Narbona E, 2018. Ecol Evol. 8(6):3064–76.

[7] Deep Plant Phenomics: A Deep Learning Platform for Complex Plant Phenotyping Tasks
Ubbens JR, Stavness I. Front Plant Sci. 2017 Jul 7;8:1190.