conradry/copy-paste-aug

How to use this copy-paste for semantic segmentation

AndyChang666 opened this issue · 3 comments

Hi, may I ask you how to use this method to augment the dataset of semantic segmentation?
Thank you.

I assume for semantic segmentation that you have a single mask with all the classes. It's not the prettiest solution, but something like this should work:

def extract_bbox(mask):
    h, w = mask.shape
    yindices = np.where(np.any(mask, axis=0))[0]
    xindices = np.where(np.any(mask, axis=1))[0]
    if yindices.shape[0]:
        y1, y2 = yindices[[0, -1]]
        x1, x2 = xindices[[0, -1]]
        y2 += 1
        x2 += 1
    else:
        y1, x1, y2, x2 = 0, 0, 0, 0

    return (y1, x1, y2, x2)

def load_example(self, index):
    image = self.load_image(index) #some function to load your image (H, W, 3)
    mask = self.load_mask(index) #some function to load your mask (H, W)

    masks = []
    bboxes = []
    #split the mask into individual binary masks for each class
    for ix, value in enumerate(np.unique(mask)[1:]):
        masks.append(mask == value)
        bboxes.append(extract_bbox(mask == value)  + (value, ix))

    #pack outputs into a dict
    output = {
        'image': image,
        'masks': masks,
        'bboxes': bboxes
    }
        
    return self.transforms(**output)

Once you have the output from copy-paste and all the other augmentations, convert back to semantic mask.

output = dataset[index]
mask_classes = [b[-2] for b in output['bboxes']]
mask_indices = [b[-1] for b in output['bboxes']]

semantic_mask = np.zeros_like(output['masks'][0]).astype(np.long) #could be uint8 if fewer than 255 classes
for class, index in  zip(mask_classes, mask_indices)
    semantic_mask += output['masks'][index] * class

del output['masks']
output['mask'] = semantic_mask

You could also further split the semantic mask by connected components (using skimage.measure.label, then you could also use skimage.measure.regionprops to extract the bounding boxes).

Thanks for your quick reply. However, the semantic segmentation task does not have a bounding box for ground truth images.
And I want to apply on cityscapes instead of coco datasets. Then how should I modify these codes?

Bounding boxes are easy to extract from a ground truth segmentation mask. That's what this line is for: bboxes.append(extract_bbox(mask == value) + (value, ix)).