How to use this copy-paste for semantic segmentation

Question

How to use this copy-paste for semantic segmentation

AndyChang666 opened this issue 4 years ago · 3 comments

Hi, may I ask you how to use this method to augment the dataset of semantic segmentation?
Thank you.

Answer 1 · 2021-02-02T01:27:56.000Z

I assume for semantic segmentation that you have a single mask with all the classes. It's not the prettiest solution, but something like this should work:

def extract_bbox(mask):
    h, w = mask.shape
    yindices = np.where(np.any(mask, axis=0))[0]
    xindices = np.where(np.any(mask, axis=1))[0]
    if yindices.shape[0]:
        y1, y2 = yindices[[0, -1]]
        x1, x2 = xindices[[0, -1]]
        y2 += 1
        x2 += 1
    else:
        y1, x1, y2, x2 = 0, 0, 0, 0

    return (y1, x1, y2, x2)

def load_example(self, index):
    image = self.load_image(index) #some function to load your image (H, W, 3)
    mask = self.load_mask(index) #some function to load your mask (H, W)

    masks = []
    bboxes = []
    #split the mask into individual binary masks for each class
    for ix, value in enumerate(np.unique(mask)[1:]):
        masks.append(mask == value)
        bboxes.append(extract_bbox(mask == value)  + (value, ix))

    #pack outputs into a dict
    output = {
        'image': image,
        'masks': masks,
        'bboxes': bboxes
    }
        
    return self.transforms(**output)

Once you have the output from copy-paste and all the other augmentations, convert back to semantic mask.

output = dataset[index]
mask_classes = [b[-2] for b in output['bboxes']]
mask_indices = [b[-1] for b in output['bboxes']]

semantic_mask = np.zeros_like(output['masks'][0]).astype(np.long) #could be uint8 if fewer than 255 classes
for class, index in  zip(mask_classes, mask_indices)
    semantic_mask += output['masks'][index] * class

del output['masks']
output['mask'] = semantic_mask

You could also further split the semantic mask by connected components (using skimage.measure.label, then you could also use skimage.measure.regionprops to extract the bounding boxes).

Answer 2 · 2021-02-02T03:51:58.000Z

Thanks for your quick reply. However, the semantic segmentation task does not have a bounding box for ground truth images.
And I want to apply on cityscapes instead of coco datasets. Then how should I modify these codes?

Answer 3 · 2021-02-02T12:33:53.000Z

Bounding boxes are easy to extract from a ground truth segmentation mask. That's what this line is for: bboxes.append(extract_bbox(mask == value) + (value, ix)).