farizrahman4u/recurrentshop

Clarification question about readouts

tom-christie opened this issue · 0 comments

I'm trying to build a custom RNN architecture and after banging my head against the Keras source code for a while I ended up here. recurrentshop is a super neat and helpful project and I think it will help me do what I'm wanting but I'm stuck.

I'm trying to build a network with the following architecture -

X - input
H - hidden state
Y - output
t - time step

Xt --> Ht is defined by a weight matrix Wxh - this is 'kernel' in the SimpleRNNCell
H_tm1 --> Ht is defined by a weight matrix Whh - this is 'recurrent_kernel' in the SimpleRNNCell
Ht --> Yt would be defined by a second layer and matrix Why, since I want it to be a secondary transformation and convert the hidden state dimension to a 1-dimensional output at each time step
Y_tm1 --> Ht is the hard part, defined by a matrix Wyh.

If I understand correctly the architecture is somewhat similar to your readout example. However I'd like to incorporate Y_tm1 into the state by treating it as a '1st class' input like so:

Ht = K.dot(Xt, Wxh) + K.dot(H_tm1, Whh) + K.dot(Y_tm1, Wyh)
Ht = tanh(Ht)

The readout example showed how to add or multiply X by the previous output Y, but I'd like to also learn the Wyh matrix. I think that means I need to include a new Dense() layer somewhere, but I am having a hard time figuring out how to do that. I am using this document as a start. I'd appreciate any help you could give! For reference, I tried re-writing the SimpleRNNCell class to include a two-part state (one for Ht and one for 'hidden' inside the cell) and ended up with a cryptic Keras error that I didn't understand.