Three questions about your Scale-Recurrent-Network architecture
tomeiss opened this issue · 6 comments
Hello @jiangsutx and @rimchang,
first of all, thanks for providing code to your publication! In the last couple of weeks I have worked through your propsed architecture and have stumbled over three issues, where I would be glad to hear your opinion:
1) Inside the LSTM cell is a bias term for the forget gate _forget_bias
set to 1.0: Is this a trainable variable included in the optimization?
2) Regarding the LSTM cell as well: There is no backfeed of the state variable C
to forget, input and output gate via Hadamard product as suggested by Shi et al. Is there a specific reason to that?
3) In your publication in Table 1 the number of total trainable parameteres is given with 3.76 million. But when I calculate them inside your model with varlist_parameters = [v.shape.num_elements() for v in self.all_vars]; np.sum(varlist_parameters )
I receive 6,876,449 parameters. What am I not seeing about that?
Thank you in advance
Tobi
Hello and thank you for your fast response.
1) I would not know where the LSTM's _forget_bias
is added to the TF graph, which is strange for me. The usual kernels and biases are added to the graph when checking tf.trainable_variables()
during def generator(self, inputs, reuse=False, scope='g_net')
but not the forget bias.
<tf.Variable 'g_net/convLSTM/LSTM_conv/weights:0' shape=(3, 3, 256, 512) dtype=float32_ref>,
<tf.Variable 'g_net/convLSTM/LSTM_conv/biases:0' shape=(512,) dtype=float32_ref>]
3) I took all trainable parameters as list from tensorflow with tf.trainable_variables()
and counted the entries' number of parameter together consisting of weights and biases. My parameter count is far higher than yours. Could you specifically explain how you determined yours?
Kind regards,
Tobi
Please refer to the source code of LSTM:
https://github.com/jiangsutx/SRN-Deblur/blob/master/util/BasicConvLSTMCell.py#L56
It uses one convolution and then split the output into 4 part, one of which is forget_gate
. And _forget_bias
is only a number, which is not trainable. I did not dig into the details. You may also refer to:
https://tensorlayer.readthedocs.io/en/1.7.0/_modules/tensorlayer/layers.html#BasicConvLSTMCell
I remember in our paper Table 1, we use 3x3 for all kernels for fast experiments. And our final version and released model uses 5x5 kernels. These details have been clarified in the corresponding paragraphs of the paper.
Sorry for the confusion.
Thank you very much for clarifying. I am currently rewritting the architecture to TF 2.2 and after adjusting the kernel sizes I received the same amount of parameters. I must have overlooked this sentence.
st have overloo
hello , can you help me and tell me how can i calculate the paramaters number ?
st have overloo
hello , can you help me and tell me how can i calculate the paramaters number ?