PosSAM: Panoptic Open-vocabulary Segment Anything

Contributions

We introduce PosSAM, an open-vocabulary panoptic segmentation model that generates class and instance-aware masks with excellent generalization to a variety of visual concepts by unifying SAM and CLIP in an end-to-end trainable framework.
We develop a novel Local Discriminative Pooling (LDP) module to enhance discriminative CLIP features with class-agnostic SAM features for an unbiased OV classification.
We introduce the Mask-Aware Selective Ensembling algorithm to adaptively discern between seen and unseen classes by leveraging IoU and LDP confidence scores for each image.
We conduct extensive experiments and demonstrate superior performance over existing state-of-the-art open-vocabulary panoptic segmentation methods across multiple benchmark datasets.

'Coming Soon...!!!'

'Coming Soon...!!!'

Vibashan/PosSAM