microsoft/X-Decoder

Detach for text

vateye opened this issue · 3 comments

Hi, I am quite confused about the loss computation. For computing the loss for learnable queries, I saw the text features are detached and thus will not be computing the gradient.

_caping_lang_embed = caping_lang_embed.detach().clone()

Hi,
Text features are not detached for all the settings, they detach on output (per-layer) but attach on query_emb. This is an empirical design choice.

So, during the training on the tasks related to learnable queries (e.g., segmentation, grounding), the text features are always detached?

Nope, please go back to the code:

query_embed = torch.cat((query_embed, caping_lang_embed), dim=0) # may not add at the beginning.

Query embedding is attached.