Disclaimer: right now the code contains a little bug, and i'll fix it as soon as possible.
I implement the Decoupled Neural Interfaces using Synthetic Gradients in tensorflow. The paper use synthetic gradient to decouple the layers in the network. This is pretty interesting since we won't suffer from update lock (Here is a talk online on DNI: https://www.youtube.com/watch?v=toZprSCCmNI) anymore. I test my model in cifar10 and archieve similar result as the paper claimed.
- Tensorflow, follow the official installation
- python 2.7
- CIFAR10 dataset, go to the dataset website
- use multi-threading on gpu to analyze the speed
- apply to some more complicated network to see if it's general
We ofter optimize NN by backpropogation, which is usually implemented in some well-known framework. However, is there another way for the layers in NN to communicate with other layers? Here comes the synthetic gradients! It gives us a way to allow neural networks to communicate, to learn to send messages between themselves, in a decoupled, scalable manner paving the way for multiple neural networks to communicate with each other or improving the long term temporal dependency of recurrent networks.
The neuron in each layer will automatically produces an error signal(δa_head) from synthetic-layers and do the optimzation. And how did the error signal generated? Actually, the network still does the backpropogation. While the error signal(δa) from the objective function is not used to optimize the neuron in the network, it is used to optimize the error signal(δa_head) produced by the synthetic-layer. The following is the illustration from the paper:
Right now I just implement the FCN version, which is set as the default network structure
You can define some variable in command line: ex: python main.py -- max_step 100000 --checkpoint_dir ./model
max_step = 50000
model_name = mlp # the ckpt will save in $checkpoint_path/$model_name/checkpoint-*
checkpoint_dir = './checkpoint' # the checkpint directory
gpu_fraction = 1/2 # you can define the gpu memory usage
batch_size = 256
hidden_size = 1000 # hidden size of the mlp
test_per_iter = 50
optim_type = adam
synthetic = False # ues synthetic gradient or not
DNI-mlp test on cifar10
cls loss | synthetic_grad loss | test acc |
---|---|---|
DNI-cnn test on cifar10
cls loss | synthetic_grad loss | test acc |
---|---|---|
unknown problem: the increase of synthetic gradient loss in CNN model
Tensorflow is known for the convenience of auto-gradient, while at the same time many people don't know how it do the backprop or calculate the backprop. Compared to Torch, there's no obvious way to access the gradOutput
, gradInput
. Actually, Tensorflow contains some beautiful function that makes it easier and more flexible.
Sometimes, you might want to calculate gradient dy/dx:
Use tf.gradients(y,x)
. It's very simple
If you want to calculate the gradientm given the gradient backprop from the loss, or sth you've defined (dy/dx = dy/du*du/dx, given dy/du):
Use tf.gradients(y,x,dy/du)
.
- Deepmind's post on Decoupled Neural Interfaces Using Synthetic Gradients