biigle/maia

Framework change to mmdet

Closed this issue · 8 comments

We could change the detection framework to mmdet, which would have multiple benefits.

  • likely to be faster and more memory efficient
  • possibility of being more flexible with algorithm choice
  • things like different augmentations, losses, ...
mzur commented

We've come to the point where we can't update TensorFlow any more without breaking the Mask R-CNN implementation of MAIA. While the code still runs, the detection output is wrong somehow. The current TF version has several security vulnerabilities.

I think we should make the move to mmdet and maybe just use a simple Faster R-CNN as a starting point. We should compare detection performance with the original MAIA implementation but it should be similar. We can then experiment with the advanced features of mmdet at some later point.

The most difficult part would be the migration of the custom implementation of the autoencoder/novelty detection method in TensorFlow.

The move to mmdet and PyTorch has the additional advantage that it's compatible with DINO, which we plan to use to sort image thumbnails.

What about dropping the autoencoder and using DINO right away or do you want to keep it for compatibility?
It might be possible to convert it to onnx (https://github.com/onnx/tensorflow-onnx) and back to pytorch (https://github.com/ToriML/onnx2pytorch) but you also might run into some problems.

If you don't want to use Mask R-CNN instance segmentation I agree that Faster R-CNN or YOLO might be a better solution because it uses less ressources.

mzur commented

What about dropping the autoencoder and using DINO right away or do you want to keep it for compatibility?

I would like to keep the current feature set in BIIGLE.

It might be possible to convert it to onnx [...]

That only works for models/weights, right? You still have to implement the code in the new framework. For MAIA novelty detection, the autoencoder is trained from scratch each time, so no model is reused at all.

To my knowledge it is not only the weights but also the operations (the model architecture) which is transformed, so it should work.

mzur commented

I managed to get MAIA running on an older version of TensorFlow again, so it is functional right now. We should still make the change to a more maintainable implementation. Maybe as part of the hackathon.

mzur commented

I pushed a Dockerfile with a complete PyTorch/mmdetection setup to the mmdet branch.

I guess it might be a good idea if you could gather some like plug in points/interfaces where code has to be injected such as - exchange noveltydetection.py it receives parameter1,2,3,4 via command line and outputs json to stdout

mzur commented

Most updates are required for the Python scripts in src/resources/scripts. That is, most of them probably have to be replaced entriely. There is one directory for novelty detection and one for instance segmentation (which is just object detection, really).

Novelty detection is handled by the NoveltyDetectionRequest job which creates a JSON file that is passed on to the DetectionRunner. This script runs the novelty detection, which in turn outputs one JSON file with detection locations for each input image to the temporary directory. These JSON files are parsed here.

Instance segmentation is handled by the InstanceSegmentationRequest job. It runs multiple steps:

  1. generateDataset: A JSON file is created, containing information on the original training image files and training proposals, and passed on to the DatasetGenerator. This script extracts image crops of each annotation and generates the mask files for each crop that are required to train Mask R-CNN. Information about these files is returned in an output JSON file.

  2. performTraining: A JSON file is created, containing information on the training scheme etc., and is passed on (together with the output JSON of the previous step) to the TrainingRunner. This script runs the training of Mask R-CNN and returns another output JSON containing information on the saved file of the trained model.

  3. performInference: A JSON file is created, containing information on the image files on which inference should be performed, and passed on (together with the output JSONs of the previous steps) to the InferenceRunner. This script runs the inference and returns one JSON file with detection locations for each image (in the same format than the novelty detection) to the temporary directory. These JSON files are parsed with the same function than the ones of the novelty detection.