This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.
If you wanna read more about the algorithms you can take a look here.
Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.
This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.
Examples of generated masks. These then get scaled and placed on the image in the right location.
Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).
Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.
TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.
We're providing pre-trained weights for MS COCO to make it easier to start. You can
use those weights as a starting point to train your own variation on the network.
Training and evaluation code is in config/coco/coco.py
. You can import this
module in Jupyter notebook (see the provided notebooks for examples) or you
can run it directly from the command line as such:
# Train a new model starting from pre-trained COCO weights
python3 config/coco/coco.py train --dataset=/path/to/coco/ --model=coco
# Train a new model starting from ImageNet weights
python3 config/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet
# Continue training a model that you had trained earlier
python3 config/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 config/coco/coco.py train --dataset=/path/to/coco/ --model=last
You can also run the COCO evaluation code with:
# Run COCO evaluation on the last trained model
python3 config/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last
The training schedule, learning rate, and other parameters should be set in config/coco/coco.py
.
This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.
-
Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.
-
Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.
To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.
-
Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.
Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt
.
-
Clone this repository.
-
Install dependencies
pip3 install -r requirements.txt
-
Install Pycocotools.
git clone https://github.com/philferriere/cocoapi.git coco
&pip3 install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
-
Download pre-trained COCO weight (mask_rcnn_coco.h5) from the this link and place it in the root directory.
-
For one time image objects detection place your image in
images
folder and runpython3 onetime.py --image=tuba.jpg
for example. The result will be saved in the root directory.
- For camera realtime objects detection run
python3 camera.py