Three questions about your Scale-Recurrent-Network architecture

Question

Three questions about your Scale-Recurrent-Network architecture

tomeiss opened this issue 4 years ago · 6 comments

Hello @jiangsutx and @rimchang,
first of all, thanks for providing code to your publication! In the last couple of weeks I have worked through your propsed architecture and have stumbled over three issues, where I would be glad to hear your opinion:

1) Inside the LSTM cell is a bias term for the forget gate _forget_bias set to 1.0: Is this a trainable variable included in the optimization?

2) Regarding the LSTM cell as well: There is no backfeed of the state variable C to forget, input and output gate via Hadamard product as suggested by Shi et al. Is there a specific reason to that?

3) In your publication in Table 1 the number of total trainable parameteres is given with 3.76 million. But when I calculate them inside your model with varlist_parameters = [v.shape.num_elements() for v in self.all_vars]; np.sum(varlist_parameters ) I receive 6,876,449 parameters. What am I not seeing about that?

Thank you in advance
Tobi

Answer 1 · 2020-09-08T13:38:57.000Z

Thanks for your email. We just use an open-source implementation of LSTM. All parts are trained through iterations. TensorFlow will handle it. In fact, you can delete LSTM modules, and the results are still hopefully good enough. 3. I am not sure about the way you calculate vars. Does it only count traininable parameters? Does it count gradients for each parameter? Does it count temporary variables maintained by optimizers? Actually, you can calculate the approximate number based on network structures. I think it should be similar to values in the paper. Best, Xin tobi-tobt <notifications@github.com> 于2020年9月7日周一下午11:36写道：

…

Hello @jiangsutx <https://github.com/jiangsutx> and @rimchang <https://github.com/rimchang>, first of all, thanks for providing code to your publication! In the last couple of weeks I have worked through your propsed architecture and have stumbled over three issues, where I would be glad to hear your opinion: *1)* Inside the LSTM cell is a bias term for the forget gate _forget_bias set to 1.0: Is this a trainable variable included in the optimization? *2)* Regarding the LSTM cell as well: There is no backfeed of the state variable C to forget, input and output gate via Hadamard product as suggested by Shi et al. Is there a specific reason to that? *3)* In your publication in Table 1 <http://www.xtao.website/projects/srndeblur/srndeblur_cvpr18.pdf> the number of total trainable parameteres is given with 3.76 million. But when I calculate them inside your model with varlist_parameters = [v.shape.num_elements() for v in self.all_vars]; np.sum(varlist_parameters ) I receive 6,876,449 parameters. What am I not seeing about that? Thank you in advance Tobi — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#57>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKK7P557UG6VRZIDVDCHW3SET4ZTANCNFSM4Q6OGOEA> .

Answer 2 · 2020-09-09T09:45:03.000Z

Hello and thank you for your fast response.

1) I would not know where the LSTM's _forget_bias is added to the TF graph, which is strange for me. The usual kernels and biases are added to the graph when checking tf.trainable_variables() during def generator(self, inputs, reuse=False, scope='g_net') but not the forget bias.
<tf.Variable 'g_net/convLSTM/LSTM_conv/weights:0' shape=(3, 3, 256, 512) dtype=float32_ref>,
<tf.Variable 'g_net/convLSTM/LSTM_conv/biases:0' shape=(512,) dtype=float32_ref>]

3) I took all trainable parameters as list from tensorflow with tf.trainable_variables() and counted the entries' number of parameter together consisting of weights and biases. My parameter count is far higher than yours. Could you specifically explain how you determined yours?

Kind regards,
Tobi

Answer 3 · 2020-09-11T04:12:51.000Z

Please refer to the source code of LSTM:
https://github.com/jiangsutx/SRN-Deblur/blob/master/util/BasicConvLSTMCell.py#L56

It uses one convolution and then split the output into 4 part, one of which is forget_gate. And _forget_bias is only a number, which is not trainable. I did not dig into the details. You may also refer to:
https://tensorlayer.readthedocs.io/en/1.7.0/_modules/tensorlayer/layers.html#BasicConvLSTMCell

I remember in our paper Table 1, we use 3x3 for all kernels for fast experiments. And our final version and released model uses 5x5 kernels. These details have been clarified in the corresponding paragraphs of the paper.

Sorry for the confusion.

Answer 4 · 2020-09-11T13:12:25.000Z

Thank you very much for clarifying. I am currently rewritting the architecture to TF 2.2 and after adjusting the kernel sizes I received the same amount of parameters. I must have overlooked this sentence.

Answer 5 · 2023-06-14T14:12:10.000Z

st have overloo

hello , can you help me and tell me how can i calculate the paramaters number ?

Answer 6 · 2023-06-14T14:12:31.000Z

st have overloo

hello , can you help me and tell me how can i calculate the paramaters number ?