keras-team/keras-cv

Bounding Box Cropping Layer: Feature Request

Michael-Blackwell opened this issue · 1 comments

Short Description

Hello!

I would like to propose implementing a vectorized bounding box cropping layer that would accept an image (or batch of images) and a tensor of boxes, then return a ragged tensor of cropped bounding boxes from the original image [batch, n_boxes, none, none, 3] (since each crop will have a different h/w).

Keras_CV already has cropping functions for preprocessing/augmentation. But there are no layers to efficiently crop multiple bounding box/s from an image. This functionality is a requirement to build 2-stage detectors for high-resolution images where the graph looks something like this:

  • 1 Input: full-resolution image (ex. a 4k image) [1, 2160, 3840, 3]
  • 2 Preprocessing: resize/rescale image for object detector
  • 3 Object Detection: yolo/Faster/etc.
  • 4 Cropping Layer: THIS IS WHERE THE MAGIC HAPPENS Using the bounding boxes from the object detector, crop objects from the original, full-resolution image in step 1 and resize/pad them for the classifier.
  • 5 Classifier: Resnet, EfficientNet, etc.
  • 6 Post Processing: Format tensors for output

Step 4 from the outline above depends on a robust bounding box cropping layer. The closest implementation I have found is TensorFlow's tf.image.crop_and_resize. The only draw-back to tf.image.crop_and_resize is the resizing step does not preserve the aspect ratio. However, keras_cv.layers.Resizing seems to have some pretty robust resizing options and accepts ragged tensors.

Due to limitations in Pytorch, I have to use a for loop to crop the bounding boxes, and in TensorFlow's tf.image.crop_and_resize the resizing options are limited. This is an opportunity for Keras to offer a functionality that is lacking in other frameworks but needed to build a specific class of models.

Papers

Multi-Stage-CV-Detection

Existing Implementations

The best implementation I could find is tf.image.crop_and_resize, but again, the resizing options are limited.

Other Information