/RetinaNet_Motile_objects_Detection

Implementation of multi-frames detection method for motile objects detection that is used in the paper: https://arxiv.org/abs/2002.04034

Primary LanguageJupyter Notebook

Motile objects detection with RetinaNet

Implementation of the multi-frames detection method for motile objects (sperm detection phase of the paper : https://arxiv.org/abs/2002.04034)

The tracking phase of the paper is available on https://github.com/mr7495/Sperm_detection_and_tracking

This repository is based on the https://github.com/fizyr/keras-retinanet and has been improved for motile objects detection

Working Environment:

Tensorflow: 1.15
Keras: 2.1

For detecting the sperms, we have applied our newly introduced method on the RetinaNet object detector to enhance the detection accuracy of motile objects. This method helps the object detectors to consider the mobility parameters of objects plus other features. To implement this idea, we must concatenate several successive frames to a single input array and then use it for training or testing the models. In training, when several frames are combined to one image, the ground truth of the middle image will be shown to the model. For example, if you want to detect objects in an image, you will concatenate it with some previous and some next frames and give them to the network while giving the ground truth of the same image (middle image) to the model. For more details, read the paper. For detecting the sperms, we have applied our newly introduced method on the RetinaNet object detector to enhance the detection accuracy of motile objects. This method helps the object detectors to consider the mobility parameters of objects plus other features. To implement this idea, we must concatenate several successive frames to a single input array and then use it for training or testing the models. In training, when several frames are combined to one image, the ground truth of the middle image will be shown to the model. For example, if you want to detect objects in an image, you will concatenate it with some previous and some next frames and give them to the network while giving the ground truth of the same image (middle image) to the model. For more details, read the paper.

photo not available
General schematic of the detection phase

photo not available
An example for concatenating five consecutive frames

photo not available
An example of a detected image

The next figures show the different evaluation metrics for the concatenation of the different number of frames (also a comparison between RetinaNet simple training and our training method):

It is obvious that the concatenation of consecutive frames results in much better training output.

The code has been tested on video samples with 25 frames and the sample of used annotations has been shared in annotation sample.csv file. If you have more than 25 frames or want to use different type of annotation file change load-image function in keras_retinanet/preprocessing/csv_generator.py address.

In this code 3 consecutive frames have been concatenated to be used as the input of RetinaNet.

If you want to use more than 3 consecutive frames you have to apply some changes in these files:

1-resnet_retinanet function in keras_retinanet/models/resnet.py (change first layer shape)

2-load_image function in keras_retinanet/preprocessing/csv_generator.py (change the code to load more than 3 consecutive frames.)

3-keras_retinanet/utils/eval.py ( This line: image1 = generator.load_image(i)[:,:,1].astype(np.uint8),1 is the middle frame when we have 3 consecutive frames.)

4-preprocess_image function in keras_retinanet/utils/image.py

The code for training and testing based on the three consecutive frames and a sample result of it has also been shared on this repository.

Note: The current version of the RetinaNet from fizyr does not support Tensorflow version higher than 1.15.

Our trained neural network based on 3 concatenated frames have been shared on: https://drive.google.com/file/d/14ufFO8GKbE5Qlrm3wloHKQcsnudwHeSR/view?usp=sharing

The inference version of our trained neural network based on 3 concatenated frames have also been shared on: https://drive.google.com/open?id=1pN3A-tWJOphRdTZ7cPhJTnTIhoiGrcWv

You can visit https://github.com/fizyr/keras-retinanet for learning the differences between the inference and the training models.

The detection results based on the concatenation of three consecutive frames are listed in the next table:

Average Precision Recall Accuracy Precision F1
99.1 98.7 96.3 97.4 98.1

If you find our work effective, please cite it by:

@article{rahimzadeh2020sperm,
  title={Sperm detection and tracking in phase-contrast microscopy image sequences using deep learning and modified CSR-DCF},
  author={Rahimzadeh, Mohammad and Attar, Abolfazl and others},
  journal={arXiv preprint arXiv:2002.04034},
  year={2020}
}

You can contact the developer by this email : mr7495@yahoo.com