Implement Net2DeeperNet for GRU layers
gchrupala opened this issue · 2 comments
gchrupala commented
Idea based on "Net2Net: Accelerating Learning via Knowledge Transfer" http://arxiv.org/abs/1511.05641
The current grow
function should initialize the newly added layer to one implementing an identity function. For a GRU, assuming non-negative inputs, and a relu or clipped relu activation, and the following definiton of the layer:
def GRU(W,U,Wz,Uz,Wr,Ur,xt,htm1):
r = sigmoid(dot(xt,Wr)+dot(htm1, Ur))
z = sigmoid(dot(xt,Wz)+dot(htm1, Uz))
htilde = rectify(dot(xt,W)+dot(r*htm1, U))
h = (1-z) * htm1 + z*htilde
return h
we could set:
- W - Identity
- U - Zero
- Wz - Identity + 2 (or any number ensuring z is close to 1)
- Uz - Identity + 2
- Wr - random
- Ur - random
gchrupala commented
Actually, more like Identity * N, not + N
- W - Identity
- U - Zero
- Wz - Identity * 2 (or any number ensuring z is close to 1)
- Uz - Identity * 2
- Wr - random
- Ur - random
gchrupala commented
Sort of working, but it seems it's hard to "undo" this particular implementation of the identity mapping. Another idea would be instead to learn an identity mapping for a particular GRU type and size, and use that to initialize.