Writing a simple CNN-LSTM model

Question

Writing a simple CNN-LSTM model

psgl opened this issue 8 years ago · 0 comments

I'm trying to write a simple CNN-LSTM network. I'm having problems with the LSTM part, probably because of some misunderstanding on my side.

The network outline (batch size omitted):
image(shape=(299, 299, 3)) --> CNN --> vector(shape=(4096,))
vector(shape=(4096,)) --> LSTM --> sequence(shape=(40, 256))
The final output is a tensor of 40 time steps, each holding a vector of 256 elements, later used with some word embedding. The embedding part is not shown here, for simplicity.

from keras.layers import Input, Dense, Conv2D, Flatten
from keras.models import Model
from recurrentshop import LSTMCell, RecurrentModel

# Visual Model
# (this is a simple mockup, used to avoid long loading times. It will later 
# be replaced with a pre-trained inception model)
dummy_visual_input = Input(shape=(299, 299, 3), name='visual_input')
dummy_visual_conv = Conv2D(filters=48, kernel_size=[5, 5], name='visual_output')(dummy_visual_input)
dummy_visual_flat = Flatten()(dummy_visual_conv)
dummy_visual_output = Dense(units=1024)(dummy_visual_flat)
visual_model = Model(dummy_visual_input, dummy_visual_output, name='visual_model')


# LSTM (based on the example in the readout documentation)
num_of_lstm_outputs = 40
num_of_lstm_units = 256
word_embedding_size = 256

c_tm1 = Input(shape=(word_embedding_size,), name='initial_RNN_cell_state')
h_tm1 = Input(shape=(word_embedding_size,), name='initial_RNN_hidden_state')
readout_input = Input(shape=(word_embedding_size,), name='first_RNN_readout_input')
rnn_input = Input(name='RNN_input', tensor=visual_model(dummy_visual_input))

depth = 3  # stacking LSTMs
rnn_output, h, c = readout_input, h_tm1, c_tm1
for i in range(depth):
    rnn_output, h, c = LSTMCell(units=num_of_lstm_units)([rnn_output, h, c])
predictions = rnn_output

rnn_model = RecurrentModel(decode=True, output_length=num_of_lstm_outputs, 
            input=rnn_input, initial_states=[h_tm1, c_tm1], output=predictions, final_states=[h, c], 
            readout_input=readout_input, return_sequences=True, state_initializer='random_normal', 
            name='RNN_model')

final_model = Model(inputs=dummy_visual_input, outputs=rnn_model.output)

I get this error:

Traceback (most recent call last):
  File "/opt/pycharm-community-2016.2.3/helpers/pydev/pydevd.py", line 1580, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/opt/pycharm-community-2016.2.3/helpers/pydev/pydevd.py", line 964, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/gilboa/Research/image_tagging/dummy.py", line 152, in <module>
    main()
  File "/home/gilboa/Research/image_tagging/dummy.py", line 144, in main
    final_model = Model(inputs=dummy_visual_input, outputs=rnn_model.output)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 909, in output
    ' has no inbound nodes.')
AttributeError: Layer RNN_model has no inbound nodes.

In addition to understanding and solving the error above, I'd like to ask two more questions:

Why h_tm1, c_tm1, readout_input, and rnn_input must be an input layer? Does this mean we have to explicitly feed each one of them with data during training?
If we would like to initialize h_tm1 and/or c_tm1, how should it be done? Let's say we'd like to init them with a vector of ones, or with the output of another layer. Could you please give a code example?

Your work on this package (and seq2seq) is incredible, Fariz. Your contribution to the Keras community is invaluable! Many thanks to you and all the people making this happen!