- Installation
- General workflow guide
- GUI interface guide
- Command line interface guide
- Pipeline setup guide
- Neural net training
Greenotyper is an image analysis tool for large scale plant phenotyping experiments.
It uses google's object detection api (GitHub link) to find the plants and thresholding to measure the size of the plants.
- python version 3.6 or 3.7
- tensorflow v2.0.0 or higher
- PyQt5 v5.9.2 or higher
- numpy v1.15.2 or higher
- pillow v5.2.0 or higher
- scikit-image v0.14.0 or higher
- Keras v2 or higher
It is recommended to install the tool in a virtualenv or in an environment in conda. Example:
conda create -n greenotyper_env python=3.7
conda activate greenotyper_env
pip install greenotyper
Install the latest version of greenotyper through pip:
pip install greenotyper
If there are problems with pip you can try calling pip3 instead:
pip3 install greenotyper
Install greenotyper through conda:
not available yet
Starting a new workflow requires setting up and testing the pipeline. It starts by opening the pipeline planner. Either you open the Greenotyper app, or opening the GUI through the command line interface:
greenotyper GUI
To open the pipeline planner, click the Pipeline planner button.
Testing the plant area detection, the network and pipeline settings are all done through the pipeline planner. For information on how use the interface go to the next section, and for general information on Pipeline setups click here.
Running the pipeline is done either through the command line or through the GUI. The command line is more efficient and can more easily be deployed on computing clusters.
The pipeline can be run on individual images or directories of images. The results are a single "database" file, which uses file locking. (If your file system has blocked file locking, then there is no guarantee the results will be correctly written when run using multi processing.)
To organise the results into a table you can use the command line option:
greenotyper organize-output input_file.csv output_file.csv
Open the app, or run the GUI from the terminal: https://github.com/MarniTausen/Greenotyper
First open the pipeline planner from the initial window.
Open your image.
Opening a trained network.
After both an image and the network have been opened, you can run find plants feature. Clicking on Find plants will draw bounding boxes around the detected plants.
To test the detection of the plant area you can use apply mask function.
The commandline is divided into subcommands, which each have their own options. The standard help message showing the subcommands are shown here:
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper <command> [<args>]
The available commands are as follows:
run Runs the greenotyper pipeline on set of images
organize-output Cleans and organizes the output
GUI Opens the Greenotyper GUI interface
train-unet Commandline options for creating and training the U-net
test-unet Test a trained u-net and output segmentation accuracies
run-unet Pipeline settings for running the unet version of the pipeline
Please see the options within each of the commands.
The greenotyper run command runs the Greenotyper tool using object detection and thresholding
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper run <pipeline file> <input image> [<args>]
Run the Greenotyper pipeline based on the pipeline settings provided.
positional arguments:
pipeline Required pipeline file.
input Image filename or directory with images
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
Number of threads available. Only used to run on
multiple images at a time. Default: 1. Settings less
than 0 use all available cores.
-s SIZE_OUTPUT, --size_output SIZE_OUTPUT
Output directory for the size measurements. Default is
no output.
-g GREENNESS_OUTPUT, --greenness_output GREENNESS_OUTPUT
Output directory for the greenness measurements.
Default is no output.
-m MASK_OUTPUT, --mask_output MASK_OUTPUT
Output directory for the produced masks. Default is no
output.
-c CROP_OUTPUT, --crop_output CROP_OUTPUT
Output directory for the cropped images. Default is no
output.
--by_day Subdividing the outputs based on per day. Recommended
to use this option or --by_individual to avoid file
system overflow.
--by_sample Subdividing the outputs based on per individual.
Recommended to use this option or --by_day avoid file
system overflow.
Organize-output
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper organize-output <input> <output> [<args>]
Cleans and organizes the output
positional arguments:
input Input database.*.csv file
output Output .csv file in an organized format
optional arguments:
-h, --help show this help message and exit
To open the user interface you can simply write:
greenotyper GUI
If the pipeline is to be run using U-net, then run-unet command should be used. The run-unet includes 3 more subcommands, which divide the pipeline into more steps. This was done so that the pipeline can be easily parallized with pre-processing and post-processing can be run seperately using has many processes as possible, and the U-net can be run on a GPU.
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper run-unet <command> [<args>]
Running U-net is divided into 3 steps:
preprocess Runs the object detection and
saves the crops ready to be run
through the U-net
process Runs U-net on the images.
This can be run on a GUI for
large speed ups.
postprocess Output the results based on
the predicted masks from the U-net
Commands for running the U-net
positional arguments:
command Which run U-net command should be called
optional arguments:
-h, --help show this help message and exit
Preprocessing for U-net
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper run-unet preprocess <pipeline file> <input images> <output directory> [<args>]
Runs the object detection and prepares crops to be run through U-net
positional arguments:
pipeline Required pipeline file.
input Directory with images
outputdir Output directory where the preprocessed data is saved.
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
Number of threads available to be used. Default: 1.
Settings less than 0 use all available cores.
-b BATCH_SIZE, --batch-size BATCH_SIZE
Batch size of images run simultaneously. Default is
set to 10. Memory usage can be lower if the batch size
is smaller.
--add_subdir ADD_SUBDIR
Provide a directory for a subdirectory which is added
to the output directory
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper run-unet process <input dir> <unet>
Process the cropped data and produced predicted masks using U-net.
positional arguments:
inputdir Input directory where batch results from the preprocessing are
located.
unet The trained Unet hdf5 file
optional arguments:
-h, --help show this help message and exit
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper run-unet postprocess <pipeline file> <inputdir> [<output args>]
Postprocessing of the U-net masks. Outputs the desired information.
positional arguments:
pipeline Pipeline settings file
inputdir Input directory containing processes data
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
Number of threads available. Only used to run on
multiple images at a time. Default: 1. Settings less
than 0 use all available cores.
-s SIZE_OUTPUT, --size_output SIZE_OUTPUT
Output directory for the size measurements. Default is
no output.
-g GREENNESS_OUTPUT, --greenness_output GREENNESS_OUTPUT
Output directory for the greenness measurements.
Default is no output.
-m MASK_OUTPUT, --mask_output MASK_OUTPUT
Output directory for the produced masks. Default is no
output.
-c CROP_OUTPUT, --crop_output CROP_OUTPUT
Output directory for the cropped images. Default is no
output.
--by_day Subdividing the outputs based on per day. Recommended
to use this option or --by_individual to avoid file
system overflow.
--by_sample Subdividing the outputs based on per individual.
Recommended to use this option or --by_day avoid file
system overflow.
Commandline options for training a U-net
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper train-unet <training directory> <unet output> [<args>]
Commandline options for creating and training the U-net
positional arguments:
training_directory Directory with training data
unet_output Filename of the trained unet
optional arguments:
-h, --help show this help message and exit
--validation_directory VALIDATION_DIRECTORY
Directory with validation data
--validation_split VALIDATION_SPLIT
Fraction of the training data used for validation, if
no validation data is provided. Default is 0.2.
--epochs EPOCHS The number of training epochs to be used. Default 20
epochs
--augment_data By default all augmentations will be performed on the
training and validation data
--no_flips Do not perform flips while augmenting the data.
--no_rotations Do not perform rotations while augmenting the data.
--no_crops Do not perform corner crops while augmenting the data.
--crop_size CROP_SIZE
The dimension of the crops, the default is 460x460,
input as 460, which is then rescaled to 512x512
Commandline options for testing the segmentation accuracy of a trained U-net
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper test-unet <testing directory> <trained unet>
Test a trained U-net and get segmentation accuracy of the model
positional arguments:
testing_directory Directory with images and labelled ground truth images
trained_unet Filename of trained u-net model (.hdf5 format)
optional arguments:
-h, --help show this help message and exit
--output_masks OUTPUT_MASKS
Output predicted masks to the provided directory
--ap_iou_threshold AP_IOU_THRESHOLD
Set the IoU threshold used for the PASCAL VOC AP.
Default is 0.5
The object detection is done using the tensorflow object detection api, found on GitHub here.
This guide has been tested on commit up to: 8518d05. Future versions might change and the following guide might not be relevant. To use the version that is known to work, you can open the commit, and click browse files and download the whole models repository from that commit.
The object detection api only works on tensorflow 1.x versions, and therefore should be trainined an enivorinment installed with the latest tensorflow 1.x version. It does not work with tensorflow 2+.
The whole install guide provided here. If access to a GPU is available choose the tensorflow-gpu install over tensorflow. To be able use GPU, the CUDA Toolkit must be installed. Depending on the version of tensorflow installed, it depends on different versions. Supported tensorflow versions, 1.12.0, 1.13.0, 1.14.0 use different versions of CUDA. Version 1.12.0, depends on version 9 of CUDA, and versions 1.13.0 and 1.14.0 depend on version 10 of CUDA.
Here is a version of installing that worked on a Mac OS X system:
conda create -n ObjectDetection python=3.6
conda activate ObjectDetection
pip install tensorflow==1.14
Versions 1.13 and 1.12 of tensorflow should also work. Install the tensorflow-gpu version if the intent is to train on a GPU.
pip install tensorflow-gpu==1.14
Pip install tensorflow gets nearly all of the dependencies listed on the guide. However the remaining dependencies were installed like this:
conda install protobuf
pip install --user pycocotools
Next was retreiving the object detection API, by downloading the whole models repository. The API is dependent on other research packages in the repository. So start by cloning the latest version, or download this commit.
git clone https://github.com/tensorflow/models.git
Next is to "compile" some of the code from the api using the following command:
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
Next make API callable, by exporting the directory to the python path:
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
Now you can test whether the API works by running the following command:
python object_detection/builders/model_builder_tf1_test.py
You should get OK on all of the tests at the end. If you use tensorflow 1.14 you will get a lot of warnings, due to the version preparing people to upgrade to version 2, but you can ignore these.
The training and testing data was created using the labelImg tool. The bounding boxes are manually drawn using labelImg, which outputs .xml files which describes the bounding boxes which have been drawn and the name of the class.
The data has to be processed into into tha different format so that the object detection api can read and use the training and testing data.
For this we created a simple script which converts the image + .xml files into .record files used by the object detection api. The scripts can be found here. Usage of the script is as follows:
python create_tf_input.py inputdirectory -r output.record -l label_map.pbtxt
To produce the training data, the images with the xml files must be stored in a directory:
python create_tf_input.py traindirectory -r train.record -l label_map.pbtxt
The same for the testing data:
python create_tf_input.py testdirectory -r test.record -l label_map.pbtxt
Finally the pipeline.config file must be updated. Depending on what is being training, setting what the number of classes are being trained is important, and the number of steps the network is trained on. The full file locations of the training and testing (evaluation) data must be updated.
Training can now be run following the guide here. Training and evaluation (testing) are run with the same command.
To see the evaluation results you use tensorboard, which has been installed with tensorflow.
To export the network you can use the following export_inference_graph.py
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/filename.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
This function outputs the frozen_inference_graph.pd. Adding this file together with the label_map.pbtxt into a network directory creates the network input used in Greenotyper.
To train a U-net a ground truth dataset must be created. Here is an example of what the training data should look like to the inputted data.
Cropped Image | Ground truth mask |
---|---|
The Cropped Image should be a square image and preferrably a resolution which is a multiple of 2, so (512x512) or (1024x1024). (512x512) was used in our case. The ground truth mask should be a black and white jpeg. The background is white and the mask is black.
The quality of the segmentation that U-net will be capable of performing will depend on the quality of the ground truth masks produced.
The ground truth dataset should be divided into 3 parts. Training, validation and testing. Validation is not mandatory, since you can set a validation split, which means that a fraction of the training data will be used as validation during training. In the study 50 ground truth images were produced, 10 were used for testing, 40 taken for training. 20% of the training data was used for validation, so 8 images were taken out, leaving 32 images in the training dataset.
For storing and inputting the data the images and ground truth should be stored in the image and label directories respectively. The directories are used as the input to the functions. Cropped images and their respective mask should have the exact same name. Please use the follow structure of directories:
- train
- image
- label
- validation
- image
- label
- test
- image
- label
The options the train-unet command.
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper train-unet <training directory> <unet output> [<args>]
Commandline options for creating and training the U-net
positional arguments:
training_directory Directory with training data
unet_output Filename of the trained unet
optional arguments:
-h, --help show this help message and exit
--validation_directory VALIDATION_DIRECTORY
Directory with validation data
--validation_split VALIDATION_SPLIT
Fraction of the training data used for validation, if
no validation data is provided. Default is 0.2.
--epochs EPOCHS The number of training epochs to be used. Default 20
epochs
--augment_data By default all augmentations will be performed on the
training and validation data
--no_flips Do not perform flips while augmenting the data.
--no_rotations Do not perform rotations while augmenting the data.
--no_crops Do not perform corner crops while augmenting the data.
--crop_size CROP_SIZE
The dimension of the crops, the default is 460x460,
input as 460, which is then rescaled to 512x512
The command expects the training directory which contains the image and label directories, and the filename of the output unet. The currently best network during training will be saved to this filename.
The validation data can be provided with the --validation_directory command. If this is not provided it will automatically use --validation_split 0.2, and take 20% of the training data randomly as validation.
The number of training epochs can be set using the --epochs command. 1 epoch corresponds 1 full run through all of the training data. The default is set to 20 epochs, after this the network will start to overfit to the data. With more data available more epochs can be used before overfitting starts. By overfitting it means that the training accuracy continues to increase while the validation accuracy will decrease.
Finally augmentation options are available. Using --augment_data the training and validation data will be augmented to artificially increase the dataset sizes. By default it includes all of the augmentations available; flips, rotations and crops. Using all of the data will increase the datasets 40 fold. To disable any augmentation simply use --no_nameofaugmentation. To change the cropping size you can use --crop_size, by default it is set to 460. Which means it takes 4 crops from each corner of size 460x460 and rescales the crop back to 512x512. Avoid using crop size difference which are to large to avoid artifacts from rescaling.
Options for testing the trained U-net
=========== GREENOTYPER (v0.7.0) ===========
usage: greenotyper test-unet <testing directory> <trained unet>
Test a trained U-net and get segmentation accuracy of the model
positional arguments:
testing_directory Directory with images and labelled ground truth images
trained_unet Filename of trained u-net model (.hdf5 format)
optional arguments:
-h, --help show this help message and exit
--output_masks OUTPUT_MASKS
Output predicted masks to the provided directory
--ap_iou_threshold AP_IOU_THRESHOLD
Set the IoU threshold used for the PASCAL VOC AP.
Default is 0.5
Subcommand test-unet takes 2 main arguments the directory with the testing data, and the trained unet. It will calculate all of the segmentation accuracies implemented and print a report. The segmentation accuracies implemented are Jaccard Index (Intersection Over Union), Dice coefficient, Recall, Precision, Pixel Accuracy, F1 score and the PASCAL VOC AP or Average Precision over Recall.
The --ap_iou_threshold option allows to change was the IoU (Intersection over Union) threshold used for the PASCAL VOC AP measure is. By default it is 0.5, which means if a mask has an IoU less than 0.5 it will be removed.
The --output_masks option can output the predicted masks used for the segmentation accuracies. The option expects a directory where it can write the output masks.
If the U-net gets high segmentation accuracies, then it is ready for being used in the analysis.