Implement Net2DeeperNet for GRU layers

Question

Implement Net2DeeperNet for GRU layers

gchrupala opened this issue 9 years ago · 2 comments

Idea based on "Net2Net: Accelerating Learning via Knowledge Transfer" http://arxiv.org/abs/1511.05641
The current grow function should initialize the newly added layer to one implementing an identity function. For a GRU, assuming non-negative inputs, and a relu or clipped relu activation, and the following definiton of the layer:

def GRU(W,U,Wz,Uz,Wr,Ur,xt,htm1):
    r = sigmoid(dot(xt,Wr)+dot(htm1, Ur))
    z = sigmoid(dot(xt,Wz)+dot(htm1, Uz))
    htilde = rectify(dot(xt,W)+dot(r*htm1, U))
    h = (1-z) * htm1 + z*htilde
    return h

we could set:

W - Identity
U - Zero
Wz - Identity + 2 (or any number ensuring z is close to 1)
Uz - Identity + 2
Wr - random
Ur - random

Answer 1 · 2015-11-23T16:55:43.000Z

Actually, more like Identity * N, not + N

W - Identity
U - Zero
Wz - Identity * 2 (or any number ensuring z is close to 1)
Uz - Identity * 2
Wr - random
Ur - random

Answer 2 · 2015-11-24T19:04:46.000Z

Sort of working, but it seems it's hard to "undo" this particular implementation of the identity mapping. Another idea would be instead to learn an identity mapping for a particular GRU type and size, and use that to initialize.