shonenkov/CLIP-ODS

Is there any paper about how this work?

JM-IP opened this issue · 1 comments

JM-IP commented
Is there any paper about how this work?

You can understand most of it by reading the source code.

Basically, the V0 uses a sliding window, choose the box with the highest score and performs postprocess. The V1 gets possible masks with OpenCV functions, gets bouding boxes from these masks and then uses CLIP to get predictions to feed to a postprocessing algorithm.