数据增强(Data Augmentation)

[TOC]

1.在pipeline的何处进行增强数据

线下增强（offline augmentation）：事先执行所有转换，实质上会增加数据集的大小，适用于较小的数据集；

线上增强（online augmentation）：在送入机器学习之前，在小批量（mini-batch）上执行这些转换；

2.常见的数据增强技术

(1) 几何变换类

几何变换类即对图像进行几何变换，包括翻转，旋转，裁剪，变形，缩放等各类操作；

(2) 颜色变换类

上面的几何变换类操作，没有改变图像本身的内容，它可能是选择了图像的一部分或者对像素进行了重分布。如果要改变图像本身的内容，就属于颜色变换类的数据增强了，常见的包括噪声、模糊、颜色变换、擦除、填充等；

(3) 其他

GAN生成数据集，mosaic拼接四张图片等；

2.1 翻转（Flip）

可以对图片进行水平和垂直翻转，从左侧开始，原始图片，水平翻转的图片，垂直翻转的图片。

2.2 旋转（Rotation）

从左向右，图像相对于前一个图像顺时针旋转90度

2.3 缩放比例（Scale）

图像可以向外或向内缩放。向外缩放时，最终图像尺寸将大于原始图像尺寸。大多数图像框架从新图像中剪切出一个部分，其大小等于原始图像。从左到右，原始图像，向外缩放10%，向外缩放20%

2.4 裁剪（Crop）

与缩放不同，我们只是从原始图像中随机抽样一个部分。然后，我们将此部分的大小调整为原始图像大小。这种方法通常称为随机裁剪（random crop）。从左至右，原始图像，左上角裁剪的图像，右下角裁剪的图像。

2.5 移位（Translation）

移位只涉及沿X或Y方向（或两者）移动图像。从左至右，原始图像，向右移位，向上移位。

2.6 调整大小（Resize）

对图像进行缩放，以保证模型具有尺度不变性

2.7 高斯噪声（Gaussian Noise）

神经网络试图学习可能无用的高频特征（大量出现的模式）时，通常会发生过度拟合。具有零均值的高斯噪声基本上在所有频率中具有数据点，从而有效地扭曲高频特征，添加适量的噪音可以增强学习能力。从左至右，原始图形，加入高斯噪声图片，加入盐和胡椒噪声图片

2.8 RGB 颜色扰动（Color distortion）& 调整亮度，对比度，饱和度，色调（brightness, contrast, saturation, and hue）

2.9 随机擦除（Random erase、CutOut）

对图像中随机选取一个矩形区域用特定的值(随机值或者数据均值)进行覆盖

2.10 Mixup

对两张图像和对应标签进行线性叠加

2.11 CutMix

在Mixup和CutOut的基础上，将图像中的某一区域去除，填充成另一图像

2.12 GAN

风格转换、生成fake图片

2.13 Mosaic

mosaic 利用了四张图片拼接，丰富检测物体的背景，而且在BN计算的时候一下子会计算四张图片的数据

参考资料：

【1】Data Augmentation | How to use Deep Learning when you have Limited Data

【2】知乎-深度学习中的数据增强方法都有哪些？

【3】CSDN-深度学习数据增强方法总结

3.动手实践

基于SSD的数据增强，代码如下：

import torchvision.transforms.functional as FT

1.flip

def flip(image, boxes):
    # Flip image
    new_image = FT.hflip(image)

    # Flip boxes
    new_boxes = boxes
    new_boxes[:, 0] = image.width - boxes[:, 0] - 1
    new_boxes[:, 2] = image.width - boxes[:, 2] - 1
    new_boxes = new_boxes[:, [2, 1, 0, 3]]

    return new_image, new_boxes

2.resize

def resize(image, boxes, dims=(300, 300), return_percent_coords=True):
    # Resize image
    new_image = FT.resize(image, dims)

    # Resize bounding boxes
    old_dims = torch.FloatTensor([image.width, image.height, image.width, image.height]).unsqueeze(0)
    new_boxes = boxes / old_dims  # percent coordinates

    if not return_percent_coords:
        new_dims = torch.FloatTensor([dims[1], dims[0], dims[1], dims[0]]).unsqueeze(0)
        new_boxes = new_boxes * new_dims

    return new_image, new_boxes

3.padding

def expand(image, boxes, filler):
    """Helps to learn to detect smaller objects.
    """
    original_h = image.size(1)
    original_w = image.size(2)
    max_scale = 4
    scale = random.uniform(1, max_scale)
    new_h = int(scale * original_h)
    new_w = int(scale * original_w)

    # Create such an image with the filler
    filler = torch.FloatTensor(filler)  # (3)
    new_image = torch.ones((3, new_h, new_w), dtype=torch.float) * filler.unsqueeze(1).unsqueeze(1)  

    # Place the original image at random coordinates in this new image (origin at top-left of image)
    left = random.randint(0, new_w - original_w)
    right = left + original_w
    top = random.randint(0, new_h - original_h)
    bottom = top + original_h
    new_image[:, top:bottom, left:right] = image

    # Adjust bounding boxes' coordinates accordingly
    new_boxes = boxes + torch.FloatTensor([left, top, left, top]).unsqueeze(0) 

    return new_image, new_boxes

4.Distort brightness, contrast, saturation, and hue

def photometric_distort(image):
    new_image = image
    distortions = [FT.adjust_brightness,
                   FT.adjust_contrast,
                   FT.adjust_saturation,
                   FT.adjust_hue]

    random.shuffle(distortions)

    for d in distortions:
        if random.random() < 0.5:
            if d.__name__ is 'adjust_hue':
                adjust_factor = random.uniform(-18 / 255., 18 / 255.)
            else:
                adjust_factor = random.uniform(0.5, 1.5)

            # Apply this distortion
            new_image = d(new_image, adjust_factor)

    return new_image

5.random crop

def random_crop(image, boxes, labels, difficulties):
    original_h = image.size(1)
    original_w = image.size(2)
    # Keep choosing a minimum overlap until a successful crop is made
    while True:
        # Randomly draw the value for minimum overlap
        min_overlap = random.choice([0., .1, .3, .5, .7, .9, None])  # 'None' refers to no cropping
        # If not cropping
        if min_overlap is None:
            return image, boxes, labels, difficulties
        max_trials = 50
        for _ in range(max_trials):
            # Crop dimensions must be in [0.3, 1] of original dimensions
            min_scale = 0.3
            scale_h = random.uniform(min_scale, 1)
            scale_w = random.uniform(min_scale, 1)
            new_h = int(scale_h * original_h)
            new_w = int(scale_w * original_w)
            # Aspect ratio has to be in [0.5, 2]
            aspect_ratio = new_h / new_w
            if not 0.5 < aspect_ratio < 2:
                continue
            # Crop coordinates (origin at top-left of image)
            left = random.randint(0, original_w - new_w)
            right = left + new_w
            top = random.randint(0, original_h - new_h)
            bottom = top + new_h
            crop = torch.FloatTensor([left, top, right, bottom])  # (4)
            # Calculate Jaccard overlap between the crop and the bounding boxes
            overlap = find_jaccard_overlap(crop.unsqueeze(0),boxes)  
            overlap = overlap.squeeze(0)  
            if overlap.max().item() < min_overlap:
                continue
            # Crop image
            new_image = image[:, top:bottom, left:right]  # (3, new_h, new_w)
            # Find centers of original bounding boxes
            bb_centers = (boxes[:, :2] + boxes[:, 2:]) / 2.  # (n_objects, 2)
            # Find bounding boxes whose centers are in the crop
            centers_in_crop = (bb_centers[:, 0] > left) * (bb_centers[:, 0] < right) * (bb_centers[:, 1] > top) * (bb_centers[:, 1] < bottom)  
            # If not a single bounding box has its center in the crop, try again
            if not centers_in_crop.any():
                continue
            # Discard bounding boxes that don't meet this criterion
            new_boxes = boxes[centers_in_crop, :]
            new_labels = labels[centers_in_crop]
            new_difficulties = difficulties[centers_in_crop]
            # Calculate bounding boxes' new coordinates in the crop
            new_boxes[:, :2] = torch.max(new_boxes[:, :2], crop[:2]) 
            new_boxes[:, :2] -= crop[:2]
            new_boxes[:, 2:] = torch.min(new_boxes[:, 2:], crop[2:])
            new_boxes[:, 2:] -= crop[:2]

            return new_image, new_boxes, new_labels, new_difficulties

6.Mosaic

    def get_data_with_Mosaic(self, index, image, boxes, labels, difficulties):
        """loads images in a mosaic
        """
        img_size = 300
        image_s4 = Image.new('RGB', (img_size*2, img_size*2), (255, 255, 255))
        boxes_s4 = torch.FloatTensor([])
        labels_s4 = labels
        difficulties_s4 = difficulties
				# 3 additional image indices
        indices = [index] + [random.randint(0, self.num_samples - 1) for _ in range(3)]  
        for i, index in enumerate(indices):
            image = Image.open(self.images[index], mode='r')
            image = image.convert('RGB')
            objects = self.objects[index]
            boxes = torch.FloatTensor(objects['boxes'])
            labels = torch.LongTensor(objects['labels'])
            difficulties = torch.ByteTensor(objects['difficulties'])
            new_image, new_boxes = resize(image, boxes, dims=(img_size, img_size), return_percent_coords=False)

            # place img in img4
            if i == 0:  # top left
                image_s4.paste(new_image, (0, img_size))
                box = new_boxes+torch.FloatTensor([0, img_size, 0, img_size]).unsqueeze(0)
            elif i == 1:  # top right
                labels_s4 = torch.cat([labels_s4,labels])
                difficulties_s4 = torch.cat([difficulties_s4,difficulties])
                image_s4.paste(new_image, (img_size, img_size))
                box = new_boxes + torch.FloatTensor([img_size, img_size, img_size, img_size]).unsqueeze(0)
            elif i == 2:  # bottom left
                labels_s4 = torch.cat([labels_s4, labels])
                difficulties_s4 = torch.cat([difficulties_s4, difficulties])
                image_s4.paste(new_image, (0, 0))
                box = new_boxes
            elif i == 3:  # bottom right
                labels_s4 = torch.cat([labels_s4, labels])
                difficulties_s4 = torch.cat([difficulties_s4, difficulties])
                image_s4.paste(new_image, (img_size, 0))
                box = new_boxes + torch.FloatTensor([img_size, 0, img_size, 0]).unsqueeze(0)

            boxes_s4 = torch.cat([boxes_s4, box], dim=0)
        return image_s4, boxes_s4, labels_s4, difficulties_s4

7.简单的验证

基于SSD300的框架的检验：

（PS：因为实验室服务器远程连接的人数太多了，连不上，所以个人电脑CPU运行的代码，运行时间一个epoch都是四个多小时，测试时间也是四五个小时，所以进行了简单的检验，主要是对Mosaic方法的检验，初始实验一天，对照实验一天，中间还悲惨的跑了一天的测试发现结果不理想，原来是mosaic的数据框标注反了，后面可视化mosaic才看到，可视化的结果也在下面）

epoch：2，batch_size：8，数据集：VOC2007

初始实验：有正常的flip，padding，random crop等，2个epoch之后，结果如下，mAP：0.064，非常低，因为跑的轮数太少了；

在前面的基础加上Mosaic方法后，同样2个epoch后，mAP：0.099，有所进步，结果如下，因为SSD300输入是固定的300x300，感觉Mosaic和前面的padding一样，将物体的尺寸缩小了，同时也有利于小物体的检测；

Mosaic的可视化：

参考资料：

【1】Github：a-PyTorch-Tutorial-to-Object-Detection

【2】YOLOv4: Optimal Speed and Accuracy of Object Detection

4.总结感想

1.平时数据增强都是直接利用pytorch 中 torchvision.transforms 几个数据增强函数，小的比赛中基本上都没怎么用数据增强，没有太多的关注这方面。其实也是很有启发性的，比如Mixup的思路和我们TIP在投论文的水下数据的增强网络大致想法是一样的，我们还利用了GAN更好的生成数据，当时我是受了显著性目标检测的启发想到的，不过YOLOv4今年4月份才出来；

2.本科时候主要还是看的比较多，虽然课内的所有大作业和各种比赛的代码核心都是自己，但是因为时间（大部分时间得刷课内成绩和奖学金）和算力卡（实验室资源留给本科生的不多）的限制，很少动手去复现和实践CV方面的任务，大部分都是跑实验之类。所以也是很想读博，希望给自己一个很长的时间去提升自己。

3.现在成型的框架太多了，都是一个团队很久的积淀，也有MMdetection等工具，很疑惑好像自己个人造轮子写出来或者复现的东西有点差劲

tangtaogo/SSD