/Identity-Mapping-ResNet-Lasagne

Reproduction of some of the results from 'Identity Mappings in Deep Residual Networks'

Primary LanguagePython

Identity Mappings in Deep Residual Networks in Lasagne/Theano

Reproduction of some of the results from the recent MSRA ResNet paper and the follow-up Wide-Resnet paper. Exploring the full-preactivation style residual layers, both normal and wide.

PreResNet WideResNet

Results on CIFAR-10

Results are presented as classification error percent.

ResNet Type Original Paper Test Results
ResNet-110 6.37 6.38
ResNet-164 5.46 5.66
Wide-ResNet 5.55 5.41

Note: ResNet-110 is the stacked 3x3 filter variant and ResNet-164 is the 'botttleneck' architecture. Both use the new pre-activation units as proposed in the paper. For Wide-ResNet the paper and test results are for depth 16 and width multiplier of 4. This repo also uses the preprocessing and training parameters from the Preactivation-ResNet paper and not the Wide-ResNet paper, so it is not a 1 to 1 comparison with the Wide-ResNet paper results

ResNet-110

ResNet-110

ResNet-164

ResNet-164

Wide-ResNet Depth-16 Width-4

Wide-ResNet

Implementation details

Had to use batch sizes of 64 for ResNet-110 and 48 for ResNet-164 due to hardware constraints. The data augmentation is exactly the same, only translations by padding then copping and left-right flipping.

Pre-Trained weights

The weights of the trained networks are available for download.

Running the networks

To run your own PreResNet simply call train.py with system args defining the type and depth of the network.

train_nn.py [type] [depth] [width]

Testing the accuracy on the test set can be done in a very similar way.

test_model.py [type] [depth] [width]

-Type (string): Can be 'normal', 'bottleneck' or 'wide'

-Depth (integer): Serves as the multiplier for how many residual blocks to insert into each section of the network

-Width (integer): Only for wide-ResNet, servers as the filter multiplier [3x3, 16*k] for residual blocks, excluding the first convolution layer.

Group Size Multiplier
Conv1 [3x3, 16] -
Conv2 [3x3, 16]
[3x3, 16]
N
Conv3 [3x3, 32]
[3x3, 32]
N
Conv4 [3x3, 64]
[3x3, 64]
N
Avg-Pool 8x8 -
Softmax 10 -

The extracted 'cifar-10-batches-py' from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz must be extracted into a 'data' folder within the working directory.

 PreResNet Directory
 |__ test_model.py
 |__ train_nn.py
 |__ models.py
 |__ utils.py
 |__ data
     |__cifar-10-batches-py
        |__ data_batch_1
        |__ data_batch_2
        |__ ...
     |__weights
        |__ resnet164_fullpreactivation.pklz
        |__ resnet110_fullpreactivation.pklz

Note: If using the wide-ResNet, the implementation in the paper will be slightly different than the one here. They use different preprocessing and a different value for L2. This repo stays consistent with the MSRA paper.

References

  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Identity Mappings in Deep Residual Networks", link
  • Sergey Zagoruyko, Nikos Komodakis, "Wide Residual Networks", link