[CV_CNN] Very Deep Convolutional Networks for Large-Scale Image Recognition
jeonggg119 opened this issue · 0 comments
jeonggg119 commented
Very Deep Convolutional Networks for Large-Scale Image Recognition
1. INTRODUCTION
- Fix other parameters and increase 'depth' of the network + Use only 'small (3x3)' convolution filters in all layers
- ILSVRC-2014 classification and localisation + other image recognition datasets
2. ConvNet Configurations
2.1. Architecture
- Input data : 224 x 224 RGB image
- 3 x 3 Conv (stride 1, padding 1) and 2 x 2 Maxpool (stride 2)
- Activation function : ReLU
- 3 FC layers (4096 - 4096 - 1000 channels)
- Final : soft-max layer
- No LRN (Local Response Normalization) except for one
2.2. Configurations
- A~E : Differ only in 'depth'
- Width of conv layer (the number of channels = feature map) : 64 -> 128 -> 256 -> 512
- 3 x 3 conv fewer parameters but still a lot (because of FC layer)
2.3. Discussion
- Stack of three 3 x 3 has Same effective receptive field as one 7 x 7 conv layer
- BUT more non-linear ( ReLU ) & fewer parameters ( 3(3^2C^2) < 7^2C^2 )
- 1 x 1 conv layer for additional non-linearity by ReLU (config C)
- GoogLeNet(1st place of ILSVRC-2014) is more complex than VGGNet
- Similarity : very deep ConvNets (22 layers) and Small conv filters(1x1, 3x3, 5x5)
- Difference : spatial resolution of the feature maps is reduced more aggressively in the first layers to decrease the amount of computation
3. Classification Framework
3.1. Training
- Generally follows AlexNet (2012) except for input crops from multi-scale training images
- Data Pre-processing
- Image Rescale (Resize)
- Single-scale training : fixed S = 256, S = 384
- Multi-scale training : randomly sampling in [256, 512] (Fine-tuning with pre-trained S = 384)
- Data Augmentation
- Random crop 224 x 224
- Random horizontal flipping
- Random RGB color shift
- Scale jittering
- Normalization : subtract mean RGB value computed on training dataset from each pixel
- Image Rescale (Resize)
- Train Details
- Multinomial logistic regression Optimization
- Mini-batch gradient descent based on backpropagation
- Learning rate : 0.01
- Momentum : 0.9
- L2 weight decay : 0.0005
- Batch size : 256
- Dropout : 0.5 ratio for first 2 FC layers
- Learning rate scheduler : decreased by a factor of 10 ( x 3 times) -> stopped at 370K iterations
- Epoch : 74 (370K iterations)
- Pre-initialization : Train shallow config A -> Train deeper config by initialization first 4 conv and last 3 fc layers with layers of A & random initialization intermediate layers by sampling from N(0, 0.001)
3.2. Testing
- Data Pre-processing
- Isotropic Rescaling to pre-defined smallest side Q (not necessarily equal to S)
- Multi-crop evaluation + Dense evaluation
- Data Augmentation : Horizontal flipping
- Network Change
- FC layers -> convolutional layers => Fully-Convolutional Net
- First FC layer -> 7 x 7 conv layer
- Last 2 FC layers -> 1 x 1 conv layers (for free input size) : applied to the whole (uncropped) img
- Add spatially Average Pooling class score map at end : to obtain a fixed-size vector of class scores
- FC layers -> convolutional layers => Fully-Convolutional Net
- Averaging Soft-max class posteriors of original and flipped images -> Final scores
4. Classification Experiments
- Dataset : ILSVRC-2012 dataset (1000 classes / 1.3M train + 50K val + 100K test)
- Use validation set as test set
4.1. Single-Scale Evaluation
- More deeper, less error + Error saturated at 19 layers
- Same depth -> High non-linearity is better (D > C)
- Deep net with Small filters is better than Shallow net with Large filters
- Scale jittering : better than fixed S
4.2. Multi-Scale Evaluation
- Better than Single-Scale Evaluation
- fixed S : Q = {S-32, S, S+32}
- Scale jittering on [256, 384, 512] : better than fixed S
4.3. Multi-Crop Evaluation
- Multi-crop & Dense evaluation : complementary -> Combination is best
4.4. Convnet fusion
- Combine the outputs of several models by averaging soft-max class posteriors -> improve performance
- Multiple ConvNet fusion Results
- ILSVRC submission : Only train the single-scale networks, as well as a multi-scale model D and Ensemble of 7 model => 7.3% test error
- Post-submission : Ensemble of 2 best-performing multi-scale models (D and E) => 7.0% using dense eval, 6.8% using combined eval
4.5. Comparison with the state of the art
- ILSVRC-2014 Classification 2nd place with 7.3% test error using an ensemble of 7 models
- Decreased the error rate to 6.8% using an ensemble of 2 models
- Single-net performance : VGG is the best
5. CONCLUSION
- Representation 'depth' is beneficial for the classification accuracy
- Generalization well to a wide range of tasks and datasets (more complex recognition pipelines)
Code Review
1. model of VGG16
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2, numpy as np
def VGG_16(weights_path=None):
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1000, activation='softmax'))
if weights_path:
model.load_weights(weights_path)
return model
2. Whole models
import torch
import torch.nn as nn
try:
from torch.hub import load_state_dict_from_url
except ImportError:
from torch.utils.model_zoo import load_url as load_state_dict_from_url
torch.manual_seed(0)
# Pretrained model weights
pretrained_model_urls = {
'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}
# Model info
cfgs = {
11: [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
13: [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
16: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
19: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']
}
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096), nn.ReLU(inplace=True), nn.Dropout(),
nn.Linear(4096, 4096), nn.ReLU(inplace=True), nn.Dropout(),
nn.Linear(4096, num_classes)
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
def make_layers(cfg, batch_norm=False):
layers = list()
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers)
def vgg(depth, batch_norm, num_classes, pretrained):
model = VGG(make_layers(cfgs[depth], batch_norm=batch_norm), num_classes, init_weights=True)
arch = 'vgg' + str(depth)
if batch_norm == True: arch += '_bn'
if pretrained and (num_classes == 1000) and (arch in pretrained_model_urls):
state_dict = load_state_dict_from_url(pretrained_model_urls[arch], progress=True)
model.load_state_dict(state_dict)
elif pretrained:
raise ValueError('No pretrained model in vggnet {} model with class number {}'.format(depth, num_classes))
return model
3. Train and Test
from model import *
from utils import *
import os
import torch
import torch.nn as nn
import torch.optim as optim
torch.manual_seed(0)
class VGGNet():
def __init__(self, depth=19, batch_norm=True, num_classes=1000, pretrained=False,
gpu_id=0, print_freq=10, epoch_print=10, epoch_save=50):
self.depth = depth
self.batch_norm = batch_norm
self.num_classes = num_classes
self.pretrained = pretrained
self.gpu = gpu_id
self.print_freq = print_freq
self.epoch_print = epoch_print
self.epoch_save = epoch_save
torch.cuda.set_device(self.gpu)
self.loss_function = nn.CrossEntropyLoss().cuda(self.gpu)
if self.pretrained:
print('=> Use pre-trained model with depth : {}, batch_norm : {}'.format(self.depth, self.batch_norm))
else:
print('=> Create model with depth : {}, batch_norm : {}'.format(self.depth, self.batch_norm))
model = vgg(self.depth, self.batch_norm, self.num_classes, self.pretrained)
self.model = model.cuda(self.gpu)
self.train_losses = list()
self.train_acc = list()
self.test_losses = list()
self.test_acc = list()
def train(self, train_data, test_data, resume=False, save=False, start_epoch=0, epochs=74,
lr=0.01, momentum=0.9, weight_decay=0.0005, milestones=False):
# Model to Train Mode
self.model.train()
# Set Optimizer and Scheduler
optimizer = optim.SGD(self.model.parameters(), lr, momentum=momentum, weight_decay=weight_decay)
if milestones:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1)
else:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [epochs//2, epochs*3//4], gamma=0.1)
# Optionally Resume from Checkpoint
if resume:
if os.path.isfile(resume):
print('=> Load checkpoint from {}'.format(resume))
loc = 'cuda:{}'.format(self.gpu)
checkpoint = torch.load(resume, map_location=loc)
self.model.load_state_dict(checkpoint['state_dict'])
start_epoch = checkpoint['epoch']
optimizer.load_state_dict(checkpoint['optimizer'])
scheduler.load_state_dict(checkpoint['scheduler'])
print('=> Loaded checkpoint from {} with epoch {}'.format(resume, checkpoint['epoch']))
else:
print('=> No checkpoint found at {}'.format(resume))
# Train
for epoch in range(start_epoch, epochs):
if epoch % self.epoch_print == 0:
print('Epoch {} Started...'.format(epoch+1))
for i, (X, y) in enumerate(train_data):
X, y = X.cuda(self.gpu, non_blocking=True), y.cuda(self.gpu, non_blocking=True)
output = self.model(X)
loss = self.loss_function(output, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % self.print_freq == 0:
train_acc = 100 * count(output, y) / y.size(0)
test_acc, test_loss = self.test(test_data)
self.train_losses.append(loss.item())
self.train_acc.append(train_acc)
self.test_losses.append(test_loss)
self.test_acc.append(test_acc)
self.model.train()
if epoch % self.epoch_print == 0:
print('Iteration : {} - Train Loss : {:.2f}, Test Loss : {:.2f}, '
'Train Acc : {:.2f}, Test Acc : {:.2f}'.format(i+1, loss.item(), test_loss,
train_acc, test_acc))
scheduler.step()
if save and (epoch % self.epoch_save == 0):
save_checkpoint(self.depth, self.batch_norm, self.num_classes, self.pretrained, epoch,
state={'epoch': epoch+1, 'state_dict':self.model.state_dict(),
'optimizer':optimizer.state_dict(), 'scheduler':scheduler})
def test(self, test_data):
correct, total = 0, 0
losses = list()
# Model to Eval Mode
self.model.eval()
# Test
with torch.no_grad():
for i, (X, y) in enumerate(test_data):
X, y = X.cuda(self.gpu, non_blocking=True), y.cuda(self.gpu, non_blocking=True)
output = self.model(X)
loss = self.loss_function(output, y)
losses.append(loss.item())
correct += count(output, y)
total += y.size(0)
return (100*correct/total, sum(losses)/len(losses))