gchrupala/funktional

Implement Net2DeeperNet for GRU layers

gchrupala opened this issue · 2 comments

Idea based on "Net2Net: Accelerating Learning via Knowledge Transfer" http://arxiv.org/abs/1511.05641
The current grow function should initialize the newly added layer to one implementing an identity function. For a GRU, assuming non-negative inputs, and a relu or clipped relu activation, and the following definiton of the layer:

def GRU(W,U,Wz,Uz,Wr,Ur,xt,htm1):
    r = sigmoid(dot(xt,Wr)+dot(htm1, Ur))
    z = sigmoid(dot(xt,Wz)+dot(htm1, Uz))
    htilde = rectify(dot(xt,W)+dot(r*htm1, U))
    h = (1-z) * htm1 + z*htilde
    return h

we could set:

  • W - Identity
  • U - Zero
  • Wz - Identity + 2 (or any number ensuring z is close to 1)
  • Uz - Identity + 2
  • Wr - random
  • Ur - random

Actually, more like Identity * N, not + N

  • W - Identity
  • U - Zero
  • Wz - Identity * 2 (or any number ensuring z is close to 1)
  • Uz - Identity * 2
  • Wr - random
  • Ur - random

Sort of working, but it seems it's hard to "undo" this particular implementation of the identity mapping. Another idea would be instead to learn an identity mapping for a particular GRU type and size, and use that to initialize.