MNIST training broken with `min/max` scale propagation

Question

MNIST training broken with `min/max` scale propagation

Closed this issue a year ago · 2 comments

Fixing min/max ops scale propagation (PR #68 ) had the side effect of breaking MNIST training. Early investigation showing a divergence of the scale factors after a couple of iterations, similarly to an unstable dynamical system.

Todo: additional investigation to understand the dynamic of the issue.

Answer 1 · 2024-01-08T18:15:10.000Z

#74 implements dynamic rescaling methods. An investigation on how to use these properly is still necessary.

Answer 2 · 2024-01-09T16:08:23.000Z

Bug fixed in #75 and #76 , with dynamic rescaling of logits gradient.