About Training Testing Gap

Hi, I noticed that, during the training process, a memory bank is constructed by randomly selecting three frames from the input video consisting of eight frames. Notably, the ground truth (GT) values, instead of the model's predicted values, are stored in the memory bank. Could you elucidate the reasons behind the incongruity between the training and testing logic？

We do use the model's predicted value in the memory bank. See

XMem/model/trainer.py

Lines 111 to 113 in 4589acc

    
           is_deep_update = np.random.rand() < self.deep_update_prob 
        
           v16, hidden = self.XMem('encode_value', frames[:,ti], f16[:,ti], hidden, masks, is_deep_update=is_deep_update) 
        
           values = torch.cat([values, v16.unsqueeze(3)], 3)

	is_deep_update = np.random.rand() < self.deep_update_prob
	v16, hidden = self.XMem('encode_value', frames[:,ti], f16[:,ti], hidden, masks, is_deep_update=is_deep_update)
	values = torch.cat([values, v16.unsqueeze(3)], 3)