Giving ground truth mask during inference in eval.py

Line 206 in a4427db

prob = processor.step(rgb, msk, labels, end=(ti==vid_length-1))

Here for every frame, we are passing the ground truth mask to the step function which in-turns uses the mask to correct the predictions and store in the memory. But won't this just inflate the results by having better masks in the future due to always having good masks one from before prediction.

For example, if I only use gt_mask of frame_1, at time t, all the masks in memory are predicted masks. But if we always send the gt_mask to the step function for the prediction of frame_t, the memory will have actual gt_masks of all the past frames. I maybe wrong but isn't this kind of leaking test data?

XMem/inference/inference_core.py

Line 90 in a4427db

pred_prob_with_bg = aggregate(mask, dim=0)

My bad,

The VideoReader class, doesn't have the mask key for all the frames, so by default None will be passed to the step function. By default, only the first frame will have the key mask.

XMem/inference/data/video_reader.py

Lines 76 to 80 in a4427db

    
           load_mask = self.use_all_mask or (gt_path == self.first_gt_path) 
        
           if load_mask and path.exists(gt_path): 
        
               mask = Image.open(gt_path).convert('P') 
        
               mask = np.array(mask, dtype=np.uint8) 
        
               data['mask'] = mask

XMem/eval.py

Line 169 in a4427db

msk = data.get('mask')

	load_mask = self.use_all_mask or (gt_path == self.first_gt_path)
	if load_mask and path.exists(gt_path):
	mask = Image.open(gt_path).convert('P')
	mask = np.array(mask, dtype=np.uint8)
	data['mask'] = mask