hkchengrex/XMem

About Training Testing Gap

yyang181 opened this issue · 1 comments

Hi, I noticed that, during the training process, a memory bank is constructed by randomly selecting three frames from the input video consisting of eight frames. Notably, the ground truth (GT) values, instead of the model's predicted values, are stored in the memory bank. Could you elucidate the reasons behind the incongruity between the training and testing logic?

We do use the model's predicted value in the memory bank. See

XMem/model/trainer.py

Lines 111 to 113 in 4589acc

is_deep_update = np.random.rand() < self.deep_update_prob
v16, hidden = self.XMem('encode_value', frames[:,ti], f16[:,ti], hidden, masks, is_deep_update=is_deep_update)
values = torch.cat([values, v16.unsqueeze(3)], 3)