Training wide residual networks for deployment using a single bit for each weight

Author: Mark D. McDonnell

This work was published in ICLR 2018 (iclr.cc). The ICLR version of the paper and the double-blind open peer review can be found at https://openreview.net/forum?id=rytNfI1AZ (download the PDF here: https://openreview.net/pdf?id=rytNfI1AZ ).

The paper is also on arxiv: https://arxiv.org/abs/1802.08530

The paper is also featured on paperswithcode.com: https://paperswithcode.com/paper/training-wide-residual-networks-for

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

Authors: Mark D. McDonnell, Hesham Mostafa, Runchun Wang, Andre van Schaik

This work was published at IEEE conferencs ACIRS 2019, held in Nagoya, Japan.

The paper is also on arxiv: https://arxiv.org/abs/1907.06916

Contact: mark.mcdonnell@unisa.edu.au

We provide in this repository code relevant to both papers that will enable the reader to verify our strong error rate results using a single-bit for each weight in inference, either when using batch normalisation layers, or replacing the combination of batch normalisation and ReLU with shifted ReLU.

At this point we have only provided CIFAR 10 and CIFAR 100 examples. One day if we get time, we will add ImageNet.

Summary of Tensorflow.keras Results in this Repository

All results in this table are for 20-10 Wide Resdual Networks, with ~26 million parameters

Task	Accuracy
CIFAR 10 Full Precision Baseline	96.62%
CIFAR 10 1-bit-per-weight with batch norms	96.25%
CIFAR 10 Full precision with shifted ReLU	95.66%
CIFAR 10 1-bit-per-weight shifted ReLU	96.52%
CIFAR 100 Full Precision Baseline	83.02%
CIFAR 100 1-bit-per-weight with batch norms	81.83%
CIFAR 100 Full precision with shifted ReLU	79.37%
CIFAR 100 1-bit-per-weight shifted ReLU	78.41%

A noteable outcome is that for CIFAR 10, 1-bit-per-weight and shifted-ReLU is effectively as good as full precison with batch norm.

Versions include:

tensorflow.keras (tensorflow 1.13, but also tested in tensorflow 2.1)
Matlab, using the matconvnet package (http://www.vlfeat.org/matconvnet/) (not for shifted ReLU networks)
pytorch (coming, but thanks @szagoruyko for re-implementation in pytorch: https://github.com/szagoruyko/binary-wide-resnet)

Key results:

Figure 1: Gap between 32 bit weights and 1 bit per weight CNNs:

McDonnell-Lab/1-bit-per-weight

Training wide residual networks for deployment using a single bit for each weight

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems