question about how cls_score is computed
Opened this issue · 1 comments
Thank you for your great contribution.
In transformer_zeroshot_predictor.py
, cls_score
is computed as follow in your code:
if self.mask_classification:
x_cls = self.projection_layer(hs)
# TODO: check if it is l2 norm
x_cls = x_cls / x_cls.norm(dim=-1, keepdim=True)
logit_scale = self.logit_scale.exp()
if self.training:
cls_score = logit_scale * x_cls @ self.text_features.clone().detach().t()
else:
cls_score = logit_scale * x_cls @ self.text_features_test.clone().detach().t()
bg_score = logit_scale * x_cls @ self.bg_feature.t()
outputs_class = torch.cat((cls_score, bg_score), -1)
out = {"pred_logits": outputs_class[-1]}
x_cls
has the shape of [num_dec_layer, bsz, num_queries, hidden_dim]
, which is [6, 32, 100, 512]
.
cls_score
is computed by matrix multiplication between x_cls
shaped [6, 32, 100, 512]
and text_features.t() shaped [512, 15]
. That makes the shape of cls_score
[6, 32, 100, 15]
.
Then, it is concat with bg_score
of shape [6, 32, 100, 1]
along last dim to obtain outputs_class
with shape [6, 32, 100, 16]
.
Finally, out["pred_logits"]
is outputs_class[-1]
.
As I understand the code, the same computations are being made along dimension 0 (6 layers) but only the output of the final layer is extracted as pred_logits
. Therefore, I extract the final layer of x_cls
from the beginning:
if self.mask_classification:
x_cls = self.projection_layer(hs)
# TODO: check if it is l2 norm
x_cls = x_cls / x_cls.norm(dim=-1, keepdim=True)
x_cls = x_cls[-1] # Extract final layer of x_cls
logit_scale = self.logit_scale.exp()
if self.training:
cls_score = logit_scale * x_cls @ self.text_features.clone().detach().t()
else:
cls_score = logit_scale * x_cls @ self.text_features_test.clone().detach().t()
bg_score = logit_scale * x_cls @ self.bg_feature.t()
outputs_class = torch.cat((cls_score, bg_score), -1)
out = {"pred_logits": outputs_class} # return
The shape of out["pred_logits"]
is the same in two cases and I expect normal training.
However, I encounter this error while training with my modified code:
ERROR [01/10 13:23:10 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
....
File "/home/phuongln6/ZegFormer/mask_former/modeling/matcher.py", line 117, in memory_efficient_forward
cost_class = -out_prob[:, tgt_ids]
IndexError: too many indices for tensor of dimension 1
It turns out that when I set batch_size=32
, this error happens when training sample 33 is reached.
Can you help debug my code?
Hi, maybe you should use the shape [1, 32, 100, 512] instead of [32, 100, 512]?