Is there any paper about how this work?

Question

Is there any paper about how this work?

JM-IP opened this issue 3 years ago · 1 comments

Answer 1 · 2022-06-22T13:57:14.000Z

You can understand most of it by reading the source code.

Basically, the V0 uses a sliding window, choose the box with the highest score and performs postprocess. The V1 gets possible masks with OpenCV functions, gets bouding boxes from these masks and then uses CLIP to get predictions to feed to a postprocessing algorithm.