[CV_CNN] Very Deep Convolutional Networks for Large-Scale Image Recognition

Question

[CV_CNN] Very Deep Convolutional Networks for Large-Scale Image Recognition

jeonggg119 opened this issue 3 years ago · 0 comments

jeonggg119 commented 3 years ago

Very Deep Convolutional Networks for Large-Scale Image Recognition

1. INTRODUCTION

Fix other parameters and increase 'depth' of the network + Use only 'small (3x3)' convolution filters in all layers
ILSVRC-2014 classification and localisation + other image recognition datasets

2. ConvNet Configurations

2.1. Architecture

Input data : 224 x 224 RGB image
3 x 3 Conv (stride 1, padding 1) and 2 x 2 Maxpool (stride 2)
Activation function : ReLU
3 FC layers (4096 - 4096 - 1000 channels)
Final : soft-max layer
No LRN (Local Response Normalization) except for one

2.2. Configurations

A~E : Differ only in 'depth'
Width of conv layer (the number of channels = feature map) : 64 -> 128 -> 256 -> 512
3 x 3 conv fewer parameters but still a lot (because of FC layer)

2.3. Discussion

Stack of three 3 x 3 has Same effective receptive field as one 7 x 7 conv layer
BUT more non-linear ( ReLU ) & fewer parameters ( 3(3^2C^2) < 7^2C^2 )
1 x 1 conv layer for additional non-linearity by ReLU (config C)
GoogLeNet(1st place of ILSVRC-2014) is more complex than VGGNet
- Similarity : very deep ConvNets (22 layers) and Small conv filters(1x1, 3x3, 5x5)
- Difference : spatial resolution of the feature maps is reduced more aggressively in the first layers to decrease the amount of computation

3. Classification Framework

3.1. Training

Generally follows AlexNet (2012) except for input crops from multi-scale training images
Data Pre-processing
- Image Rescale (Resize)
  - Single-scale training : fixed S = 256, S = 384
  - Multi-scale training : randomly sampling in [256, 512] (Fine-tuning with pre-trained S = 384)
- Data Augmentation
  - Random crop 224 x 224
  - Random horizontal flipping
  - Random RGB color shift
  - Scale jittering
  - Normalization : subtract mean RGB value computed on training dataset from each pixel
Train Details
- Multinomial logistic regression Optimization
- Mini-batch gradient descent based on backpropagation
  - Learning rate : 0.01
  - Momentum : 0.9
  - L2 weight decay : 0.0005
- Batch size : 256
- Dropout : 0.5 ratio for first 2 FC layers
- Learning rate scheduler : decreased by a factor of 10 ( x 3 times) -> stopped at 370K iterations
- Epoch : 74 (370K iterations)
- Pre-initialization : Train shallow config A -> Train deeper config by initialization first 4 conv and last 3 fc layers with layers of A & random initialization intermediate layers by sampling from N(0, 0.001)

3.2. Testing

Data Pre-processing
- Isotropic Rescaling to pre-defined smallest side Q (not necessarily equal to S)
- Multi-crop evaluation + Dense evaluation
- Data Augmentation : Horizontal flipping
Network Change
- FC layers -> convolutional layers => Fully-Convolutional Net
  - First FC layer -> 7 x 7 conv layer
  - Last 2 FC layers -> 1 x 1 conv layers (for free input size) : applied to the whole (uncropped) img
- Add spatially Average Pooling class score map at end : to obtain a fixed-size vector of class scores
Averaging Soft-max class posteriors of original and flipped images -> Final scores

4. Classification Experiments

Dataset : ILSVRC-2012 dataset (1000 classes / 1.3M train + 50K val + 100K test)
Use validation set as test set

4.1. Single-Scale Evaluation

More deeper, less error + Error saturated at 19 layers
Same depth -> High non-linearity is better (D > C)
Deep net with Small filters is better than Shallow net with Large filters
Scale jittering : better than fixed S

4.2. Multi-Scale Evaluation

Better than Single-Scale Evaluation
fixed S : Q = {S-32, S, S+32}
Scale jittering on [256, 384, 512] : better than fixed S

4.3. Multi-Crop Evaluation

Multi-crop & Dense evaluation : complementary -> Combination is best

4.4. Convnet fusion

Combine the outputs of several models by averaging soft-max class posteriors -> improve performance
Multiple ConvNet fusion Results
- ILSVRC submission : Only train the single-scale networks, as well as a multi-scale model D and Ensemble of 7 model => 7.3% test error
- Post-submission : Ensemble of 2 best-performing multi-scale models (D and E) => 7.0% using dense eval, 6.8% using combined eval

4.5. Comparison with the state of the art

ILSVRC-2014 Classification 2nd place with 7.3% test error using an ensemble of 7 models
Decreased the error rate to 6.8% using an ensemble of 2 models
Single-net performance : VGG is the best

5. CONCLUSION

Representation 'depth' is beneficial for the classification accuracy
Generalization well to a wide range of tasks and datasets (more complex recognition pipelines)

Code Review

1. model of VGG16

from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2, numpy as np

def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)

    return model

2. Whole models

import torch
import torch.nn as nn

try:
    from torch.hub import load_state_dict_from_url
except ImportError:
    from torch.utils.model_zoo import load_url as load_state_dict_from_url

torch.manual_seed(0)

# Pretrained model weights
pretrained_model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
    'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
    'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
    'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
    'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}

# Model info
cfgs = {
    11: [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    13: [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    16: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    19: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']
}


class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096), nn.ReLU(inplace=True), nn.Dropout(),
            nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(),
            nn.Linear(4096, num_classes)
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

def make_layers(cfg, batch_norm=False):
    layers = list()
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)


def vgg(depth, batch_norm, num_classes, pretrained):
    model = VGG(make_layers(cfgs[depth], batch_norm=batch_norm), num_classes, init_weights=True)
    arch = 'vgg' + str(depth)
    if batch_norm == True: arch += '_bn'

    if pretrained and (num_classes == 1000) and (arch in pretrained_model_urls):
        state_dict = load_state_dict_from_url(pretrained_model_urls[arch], progress=True)
        model.load_state_dict(state_dict)
    elif pretrained:
        raise ValueError('No pretrained model in vggnet {} model with class number {}'.format(depth, num_classes))

    return model

3. Train and Test

from model import *
from utils import *
import os

import torch
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(0)

class VGGNet():
    def __init__(self, depth=19, batch_norm=True, num_classes=1000, pretrained=False,
                 gpu_id=0, print_freq=10, epoch_print=10, epoch_save=50):

        self.depth = depth
        self.batch_norm = batch_norm
        self.num_classes = num_classes
        self.pretrained = pretrained
        self.gpu = gpu_id
        self.print_freq = print_freq
        self.epoch_print = epoch_print
        self.epoch_save = epoch_save

        torch.cuda.set_device(self.gpu)

        self.loss_function = nn.CrossEntropyLoss().cuda(self.gpu)

        if self.pretrained:
            print('=> Use pre-trained model with depth : {}, batch_norm : {}'.format(self.depth, self.batch_norm))
        else:
            print('=> Create model with depth : {}, batch_norm : {}'.format(self.depth, self.batch_norm))

        model = vgg(self.depth, self.batch_norm, self.num_classes, self.pretrained)
        self.model = model.cuda(self.gpu)

        self.train_losses = list()
        self.train_acc = list()
        self.test_losses = list()
        self.test_acc = list()


    def train(self, train_data, test_data, resume=False, save=False, start_epoch=0, epochs=74,
              lr=0.01, momentum=0.9, weight_decay=0.0005, milestones=False):
        # Model to Train Mode
        self.model.train()

        # Set Optimizer and Scheduler
        optimizer = optim.SGD(self.model.parameters(), lr, momentum=momentum, weight_decay=weight_decay)
        if milestones:
            scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1)
        else:
            scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [epochs//2, epochs*3//4], gamma=0.1)

        # Optionally Resume from Checkpoint
        if resume:
            if os.path.isfile(resume):
                print('=> Load checkpoint from {}'.format(resume))
                loc = 'cuda:{}'.format(self.gpu)
                checkpoint = torch.load(resume, map_location=loc)

                self.model.load_state_dict(checkpoint['state_dict'])

                start_epoch = checkpoint['epoch']
                optimizer.load_state_dict(checkpoint['optimizer'])
                scheduler.load_state_dict(checkpoint['scheduler'])
                print('=> Loaded checkpoint from {} with epoch {}'.format(resume, checkpoint['epoch']))
            else:
                print('=> No checkpoint found at {}'.format(resume))

        # Train
        for epoch in range(start_epoch, epochs):
            if epoch % self.epoch_print == 0:
                print('Epoch {} Started...'.format(epoch+1))
            for i, (X, y) in enumerate(train_data):
                X, y = X.cuda(self.gpu, non_blocking=True), y.cuda(self.gpu, non_blocking=True)
                output = self.model(X)
                loss = self.loss_function(output, y)

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                if (i+1) % self.print_freq == 0:
                    train_acc = 100 * count(output, y) / y.size(0)
                    test_acc, test_loss = self.test(test_data)

                    self.train_losses.append(loss.item())
                    self.train_acc.append(train_acc)
                    self.test_losses.append(test_loss)
                    self.test_acc.append(test_acc)

                    self.model.train()

                    if epoch % self.epoch_print == 0:
                        print('Iteration : {} - Train Loss : {:.2f}, Test Loss : {:.2f}, '
                              'Train Acc : {:.2f}, Test Acc : {:.2f}'.format(i+1, loss.item(), test_loss,
                                                                             train_acc, test_acc))

            scheduler.step()
            if save and (epoch % self.epoch_save == 0):
                save_checkpoint(self.depth, self.batch_norm, self.num_classes, self.pretrained, epoch,
                                state={'epoch': epoch+1, 'state_dict':self.model.state_dict(),
                                       'optimizer':optimizer.state_dict(), 'scheduler':scheduler})


    def test(self, test_data):
        correct, total = 0, 0
        losses = list()

        # Model to Eval Mode
        self.model.eval()

        # Test
        with torch.no_grad():
            for i, (X, y) in enumerate(test_data):
                X, y = X.cuda(self.gpu, non_blocking=True), y.cuda(self.gpu, non_blocking=True)
                output = self.model(X)

                loss = self.loss_function(output, y)
                losses.append(loss.item())

                correct += count(output, y)
                total += y.size(0)

        return (100*correct/total, sum(losses)/len(losses))