clovaai/ECLIPSE

Curiosity of ECLIPSE's network architecture

Closed this issue · 4 comments

Hi! May I ask why you adopted the Mask2Former architecture? Is it because of some disadvantage of Mask2Former in continual learning

Hi.

There are several reasons why we adopted Mask2Former.

  1. Mask2Former is one of the representative and widely used segmentation models recently.
  2. To apply the visual prompt tuning to continual panoptic segmentation, Mask2former is the most suitable transformer-based architecture.
  3. Mask2Former is powerful and supports universal image segmentation (including panoptic, semantic, and instance segment tasks).

My apologies for my poor English. I did not mean to ask this question. The question I really want to ask is:

Why does ECLIPSE change the Mask2Former architecture? In the original Mask2Former design, there are some connections between the pixel decoder and the transformer. However, in ECLIPSE, these connections from the pixel decoder are removed and changed to image embeddings from the backbone output. May I ask why you changed the Mask2Former architecture? Is it because of some disadvantage of Mask2Former in continual learning?

The above information is taken from pictures of the Mask2Former and ECLIPSE design. If what I have written above is incorrect, please correct me.

The architectural design of ECLIPSE is totally based on Mask2Former.
I guess you misunderstood about our architecture.

image
image

Oh, I am more familiar with the image below, but I found they were the same as the code implementation. Thanks for your patient reply!

image