gradient value is very small in energy constraints situation

Question

gradient value is very small in energy constraints situation

Opened this issue 8 months ago · 4 comments

I tried to implement the knowledge network in the energy constraints situation on my own task. However, I found that the grad from the get_mean_shift function is very small, which will not have any effect on the original mean. For example, the grad values are below:

tensor([[[[ 2.7554e-08]],

         [[-8.8887e-09]],

         [[ 8.4224e-09]],

         ...,

         [[ 5.1067e-12]],

         [[ 3.8011e-11]],

         [[-2.2646e-12]]],


        [[[-5.8177e-09]],

         [[ 1.4201e-09]],

         [[-2.9926e-10]],

         ...,

         [[ 7.6124e-12]],
...
         [[-4.7238e-11]],

         [[-5.8617e-11]],

         [[ 4.8769e-11]]]], device='cuda:0')

So what is your case?

Answer 1 · 2023-12-24T08:50:55.000Z

Thank you for your interest in our work and your question. In our experiments, we have observed the norm of the grad is typically around $10^{-4}$. This value is not extremely small compared with the original mean, which is around $10^{-1}$. To ensure the effectiveness, we also use a rather large scale factor.
We plan to release the code for experiments on N-body MNIST in the near future. Please stay tuned for updates and more details.

Answer 2 · 2023-12-24T14:22:47.000Z

So your case is also in the energy constraint task? @gaozhihan

Answer 3 · 2023-12-24T17:36:56.000Z

Yes, it is the case of the experiments on N-body MNIST, with alignment of energy conservation.

Answer 4 · 2023-12-25T02:00:30.000Z

I see. Thanks for your kind reply. We look forward to your released code regarding the energy constraint to revise our code. Thank you very much.