martinruenz/maskfusion

Is RGB used for Recognition?

Closed this issue · 3 comments

Hi,

I'd like to know if RGB is used for recognition

So that for example, it can differentiate between two Different Brands of Cereal Boxes which actually shapes the same

Thanks!

@martinruenz
Roughly read your paper and correct me if I'm wrong:

  1. Mask R-CNN is used for segmentation, but they are meant for 2D RGB
  2. MaskFusion is a variant of Co-Fusion where it can perform SLAM with moving objects (great work there!)
  3. MaskFusion uses Mask R-CNN for Recognition and somehow applies that data & segments the 3D SLAM data

But I have some questions:

  1. Since Mask R-CNN is meant for 2D RGB, does MaskFusion use both RGB & Depth data for Recognition?
    For example, recognize the shape of the object, regardless of color

From my observations I think only RGB is used for Recognition, since MaskFusion seems to use the models already trained & used in the Mask R-CNN repo

  1. Since Mask R-CNN is used for Recognition, for any custom labels, the training workflow & pipeline is the same as described in Mask R-CNN and then just imported into MaskFusion for running the inference?

Thanks!

Mask R-CNN is used for segmentation, but they are meant for 2D RGB

Correct, but there is also a geometric term for the segmentation. So it's Mask-RCNN + Depth segmentation.

MaskFusion is a variant of Co-Fusion where it can perform SLAM with moving objects (great work there!)

MaskFusion is indeed based on Co-Fusion. And the whole SLAM back-end is pretty much the same. The major difference is how the segmentation works (both methods perform the segmentation in 2D).

MaskFusion uses Mask R-CNN for Recognition and somehow applies that data & segments the 3D SLAM data

Correct. The "somehow" works as follows: If a segment can not be associated with an existing 3D model and meets some criteria, a new 3D model is created and initialized with the data belonging to this segment. Afterwards, when new frames come in, MaskFusion will try to associate 2D segments with the 3D model. If the association is successful the 3D model is updated, otherwise yet another 3D model might be created, and so on...

Since Mask R-CNN is meant for 2D RGB, does MaskFusion use both RGB & Depth data for Recognition?

Depth is not used for recognition, only to refine the segmentation.

Since Mask R-CNN is used for Recognition, for any custom labels, the training workflow & pipeline is the same as described in Mask R-CNN and then just imported into MaskFusion for running the inference?

Correct.

Hope this helps!

Thanks!