jeonggg119/DL_paper

[CV_CNN] Very Deep Convolutional Networks for Large-Scale Image Recognition

jeonggg119 opened this issue · 0 comments

Very Deep Convolutional Networks for Large-Scale Image Recognition

1. INTRODUCTION

  • Fix other parameters and increase 'depth' of the network + Use only 'small (3x3)' convolution filters in all layers
  • ILSVRC-2014 classification and localisation + other image recognition datasets

2. ConvNet Configurations

2.1. Architecture

  • Input data : 224 x 224 RGB image
  • 3 x 3 Conv (stride 1, padding 1) and 2 x 2 Maxpool (stride 2)
  • Activation function : ReLU
  • 3 FC layers (4096 - 4096 - 1000 channels)
  • Final : soft-max layer
  • No LRN (Local Response Normalization) except for one

2.2. Configurations

  • A~E : Differ only in 'depth'
  • Width of conv layer (the number of channels = feature map) : 64 -> 128 -> 256 -> 512
  • 3 x 3 conv fewer parameters but still a lot (because of FC layer)

image

image

2.3. Discussion

  • Stack of three 3 x 3 has Same effective receptive field as one 7 x 7 conv layer
  • BUT more non-linear ( ReLU ) & fewer parameters ( 3(3^2C^2) < 7^2C^2 )
  • 1 x 1 conv layer for additional non-linearity by ReLU (config C)
  • GoogLeNet(1st place of ILSVRC-2014) is more complex than VGGNet
    • Similarity : very deep ConvNets (22 layers) and Small conv filters(1x1, 3x3, 5x5)
    • Difference : spatial resolution of the feature maps is reduced more aggressively in the first layers to decrease the amount of computation

image

3. Classification Framework

3.1. Training

  • Generally follows AlexNet (2012) except for input crops from multi-scale training images
  • Data Pre-processing
    • Image Rescale (Resize)
      • Single-scale training : fixed S = 256, S = 384
      • Multi-scale training : randomly sampling in [256, 512] (Fine-tuning with pre-trained S = 384)
    • Data Augmentation
      • Random crop 224 x 224
      • Random horizontal flipping
      • Random RGB color shift
      • Scale jittering
      • Normalization : subtract mean RGB value computed on training dataset from each pixel
  • Train Details
    • Multinomial logistic regression Optimization
    • Mini-batch gradient descent based on backpropagation
      • Learning rate : 0.01
      • Momentum : 0.9
      • L2 weight decay : 0.0005
    • Batch size : 256
    • Dropout : 0.5 ratio for first 2 FC layers
    • Learning rate scheduler : decreased by a factor of 10 ( x 3 times) -> stopped at 370K iterations
    • Epoch : 74 (370K iterations)
    • Pre-initialization : Train shallow config A -> Train deeper config by initialization first 4 conv and last 3 fc layers with layers of A & random initialization intermediate layers by sampling from N(0, 0.001)

image
image

3.2. Testing

  • Data Pre-processing
    • Isotropic Rescaling to pre-defined smallest side Q (not necessarily equal to S)
    • Multi-crop evaluation + Dense evaluation
    • Data Augmentation : Horizontal flipping
  • Network Change
    • FC layers -> convolutional layers => Fully-Convolutional Net
      • First FC layer -> 7 x 7 conv layer
      • Last 2 FC layers -> 1 x 1 conv layers (for free input size) : applied to the whole (uncropped) img
    • Add spatially Average Pooling class score map at end : to obtain a fixed-size vector of class scores
  • Averaging Soft-max class posteriors of original and flipped images -> Final scores

4. Classification Experiments

  • Dataset : ILSVRC-2012 dataset (1000 classes / 1.3M train + 50K val + 100K test)
  • Use validation set as test set

4.1. Single-Scale Evaluation

  • More deeper, less error + Error saturated at 19 layers
  • Same depth -> High non-linearity is better (D > C)
  • Deep net with Small filters is better than Shallow net with Large filters
  • Scale jittering : better than fixed S

image

4.2. Multi-Scale Evaluation

  • Better than Single-Scale Evaluation
  • fixed S : Q = {S-32, S, S+32}
  • Scale jittering on [256, 384, 512] : better than fixed S

image

4.3. Multi-Crop Evaluation

  • Multi-crop & Dense evaluation : complementary -> Combination is best

image

4.4. Convnet fusion

  • Combine the outputs of several models by averaging soft-max class posteriors -> improve performance
  • Multiple ConvNet fusion Results
    • ILSVRC submission : Only train the single-scale networks, as well as a multi-scale model D and Ensemble of 7 model => 7.3% test error
    • Post-submission : Ensemble of 2 best-performing multi-scale models (D and E) => 7.0% using dense eval, 6.8% using combined eval

4.5. Comparison with the state of the art

  • ILSVRC-2014 Classification 2nd place with 7.3% test error using an ensemble of 7 models
  • Decreased the error rate to 6.8% using an ensemble of 2 models
  • Single-net performance : VGG is the best

5. CONCLUSION

  • Representation 'depth' is beneficial for the classification accuracy
  • Generalization well to a wide range of tasks and datasets (more complex recognition pipelines)

Code Review

1. model of VGG16

image

from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2, numpy as np

def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)

    return model

2. Whole models

import torch
import torch.nn as nn

try:
    from torch.hub import load_state_dict_from_url
except ImportError:
    from torch.utils.model_zoo import load_url as load_state_dict_from_url

torch.manual_seed(0)

# Pretrained model weights
pretrained_model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
    'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
    'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
    'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
    'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}

# Model info
cfgs = {
    11: [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    13: [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    16: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    19: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']
}


class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096), nn.ReLU(inplace=True), nn.Dropout(),
            nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(),
            nn.Linear(4096, num_classes)
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

def make_layers(cfg, batch_norm=False):
    layers = list()
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)


def vgg(depth, batch_norm, num_classes, pretrained):
    model = VGG(make_layers(cfgs[depth], batch_norm=batch_norm), num_classes, init_weights=True)
    arch = 'vgg' + str(depth)
    if batch_norm == True: arch += '_bn'

    if pretrained and (num_classes == 1000) and (arch in pretrained_model_urls):
        state_dict = load_state_dict_from_url(pretrained_model_urls[arch], progress=True)
        model.load_state_dict(state_dict)
    elif pretrained:
        raise ValueError('No pretrained model in vggnet {} model with class number {}'.format(depth, num_classes))

    return model

3. Train and Test

from model import *
from utils import *
import os

import torch
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(0)

class VGGNet():
    def __init__(self, depth=19, batch_norm=True, num_classes=1000, pretrained=False,
                 gpu_id=0, print_freq=10, epoch_print=10, epoch_save=50):

        self.depth = depth
        self.batch_norm = batch_norm
        self.num_classes = num_classes
        self.pretrained = pretrained
        self.gpu = gpu_id
        self.print_freq = print_freq
        self.epoch_print = epoch_print
        self.epoch_save = epoch_save

        torch.cuda.set_device(self.gpu)

        self.loss_function = nn.CrossEntropyLoss().cuda(self.gpu)

        if self.pretrained:
            print('=> Use pre-trained model with depth : {}, batch_norm : {}'.format(self.depth, self.batch_norm))
        else:
            print('=> Create model with depth : {}, batch_norm : {}'.format(self.depth, self.batch_norm))

        model = vgg(self.depth, self.batch_norm, self.num_classes, self.pretrained)
        self.model = model.cuda(self.gpu)

        self.train_losses = list()
        self.train_acc = list()
        self.test_losses = list()
        self.test_acc = list()


    def train(self, train_data, test_data, resume=False, save=False, start_epoch=0, epochs=74,
              lr=0.01, momentum=0.9, weight_decay=0.0005, milestones=False):
        # Model to Train Mode
        self.model.train()

        # Set Optimizer and Scheduler
        optimizer = optim.SGD(self.model.parameters(), lr, momentum=momentum, weight_decay=weight_decay)
        if milestones:
            scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1)
        else:
            scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [epochs//2, epochs*3//4], gamma=0.1)

        # Optionally Resume from Checkpoint
        if resume:
            if os.path.isfile(resume):
                print('=> Load checkpoint from {}'.format(resume))
                loc = 'cuda:{}'.format(self.gpu)
                checkpoint = torch.load(resume, map_location=loc)

                self.model.load_state_dict(checkpoint['state_dict'])

                start_epoch = checkpoint['epoch']
                optimizer.load_state_dict(checkpoint['optimizer'])
                scheduler.load_state_dict(checkpoint['scheduler'])
                print('=> Loaded checkpoint from {} with epoch {}'.format(resume, checkpoint['epoch']))
            else:
                print('=> No checkpoint found at {}'.format(resume))

        # Train
        for epoch in range(start_epoch, epochs):
            if epoch % self.epoch_print == 0:
                print('Epoch {} Started...'.format(epoch+1))
            for i, (X, y) in enumerate(train_data):
                X, y = X.cuda(self.gpu, non_blocking=True), y.cuda(self.gpu, non_blocking=True)
                output = self.model(X)
                loss = self.loss_function(output, y)

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                if (i+1) % self.print_freq == 0:
                    train_acc = 100 * count(output, y) / y.size(0)
                    test_acc, test_loss = self.test(test_data)

                    self.train_losses.append(loss.item())
                    self.train_acc.append(train_acc)
                    self.test_losses.append(test_loss)
                    self.test_acc.append(test_acc)

                    self.model.train()

                    if epoch % self.epoch_print == 0:
                        print('Iteration : {} - Train Loss : {:.2f}, Test Loss : {:.2f}, '
                              'Train Acc : {:.2f}, Test Acc : {:.2f}'.format(i+1, loss.item(), test_loss,
                                                                             train_acc, test_acc))

            scheduler.step()
            if save and (epoch % self.epoch_save == 0):
                save_checkpoint(self.depth, self.batch_norm, self.num_classes, self.pretrained, epoch,
                                state={'epoch': epoch+1, 'state_dict':self.model.state_dict(),
                                       'optimizer':optimizer.state_dict(), 'scheduler':scheduler})


    def test(self, test_data):
        correct, total = 0, 0
        losses = list()

        # Model to Eval Mode
        self.model.eval()

        # Test
        with torch.no_grad():
            for i, (X, y) in enumerate(test_data):
                X, y = X.cuda(self.gpu, non_blocking=True), y.cuda(self.gpu, non_blocking=True)
                output = self.model(X)

                loss = self.loss_function(output, y)
                losses.append(loss.item())

                correct += count(output, y)
                total += y.size(0)

        return (100*correct/total, sum(losses)/len(losses))