jiangsutx/SRN-Deblur

Three questions about your Scale-Recurrent-Network architecture

tomeiss opened this issue · 6 comments

Hello @jiangsutx and @rimchang,
first of all, thanks for providing code to your publication! In the last couple of weeks I have worked through your propsed architecture and have stumbled over three issues, where I would be glad to hear your opinion:

1) Inside the LSTM cell is a bias term for the forget gate _forget_bias set to 1.0: Is this a trainable variable included in the optimization?

2) Regarding the LSTM cell as well: There is no backfeed of the state variable C to forget, input and output gate via Hadamard product as suggested by Shi et al. Is there a specific reason to that?

3) In your publication in Table 1 the number of total trainable parameteres is given with 3.76 million. But when I calculate them inside your model with varlist_parameters = [v.shape.num_elements() for v in self.all_vars]; np.sum(varlist_parameters ) I receive 6,876,449 parameters. What am I not seeing about that?

Thank you in advance
Tobi

Hello and thank you for your fast response.

1) I would not know where the LSTM's _forget_bias is added to the TF graph, which is strange for me. The usual kernels and biases are added to the graph when checking tf.trainable_variables() during def generator(self, inputs, reuse=False, scope='g_net') but not the forget bias.
<tf.Variable 'g_net/convLSTM/LSTM_conv/weights:0' shape=(3, 3, 256, 512) dtype=float32_ref>,
<tf.Variable 'g_net/convLSTM/LSTM_conv/biases:0' shape=(512,) dtype=float32_ref>]

3) I took all trainable parameters as list from tensorflow with tf.trainable_variables() and counted the entries' number of parameter together consisting of weights and biases. My parameter count is far higher than yours. Could you specifically explain how you determined yours?

Kind regards,
Tobi

Please refer to the source code of LSTM:
https://github.com/jiangsutx/SRN-Deblur/blob/master/util/BasicConvLSTMCell.py#L56

It uses one convolution and then split the output into 4 part, one of which is forget_gate. And _forget_bias is only a number, which is not trainable. I did not dig into the details. You may also refer to:
https://tensorlayer.readthedocs.io/en/1.7.0/_modules/tensorlayer/layers.html#BasicConvLSTMCell

I remember in our paper Table 1, we use 3x3 for all kernels for fast experiments. And our final version and released model uses 5x5 kernels. These details have been clarified in the corresponding paragraphs of the paper.

Sorry for the confusion.

Thank you very much for clarifying. I am currently rewritting the architecture to TF 2.2 and after adjusting the kernel sizes I received the same amount of parameters. I must have overlooked this sentence.

st have overloo

hello , can you help me and tell me how can i calculate the paramaters number ?

st have overloo

hello , can you help me and tell me how can i calculate the paramaters number ?