Bounding Box Cropping Layer: Feature Request
Michael-Blackwell opened this issue · 1 comments
Short Description
Hello!
I would like to propose implementing a vectorized bounding box cropping layer that would accept an image (or batch of images) and a tensor of boxes, then return a ragged tensor of cropped bounding boxes from the original image [batch, n_boxes, none, none, 3] (since each crop will have a different h/w).
Keras_CV already has cropping functions for preprocessing/augmentation. But there are no layers to efficiently crop multiple bounding box/s from an image. This functionality is a requirement to build 2-stage detectors for high-resolution images where the graph looks something like this:
- 1 Input: full-resolution image (ex. a 4k image) [1, 2160, 3840, 3]
- 2 Preprocessing: resize/rescale image for object detector
- 3 Object Detection: yolo/Faster/etc.
- 4 Cropping Layer: THIS IS WHERE THE MAGIC HAPPENS Using the bounding boxes from the object detector, crop objects from the original, full-resolution image in step 1 and resize/pad them for the classifier.
- 5 Classifier: Resnet, EfficientNet, etc.
- 6 Post Processing: Format tensors for output
Step 4 from the outline above depends on a robust bounding box cropping layer. The closest implementation I have found is TensorFlow's tf.image.crop_and_resize. The only draw-back to tf.image.crop_and_resize is the resizing step does not preserve the aspect ratio. However, keras_cv.layers.Resizing seems to have some pretty robust resizing options and accepts ragged tensors.
Due to limitations in Pytorch, I have to use a for loop to crop the bounding boxes, and in TensorFlow's tf.image.crop_and_resize the resizing options are limited. This is an opportunity for Keras to offer a functionality that is lacking in other frameworks but needed to build a specific class of models.
Papers
Existing Implementations
The best implementation I could find is tf.image.crop_and_resize, but again, the resizing options are limited.
Other Information