MNIST training broken with `min/max` scale propagation
Closed this issue · 2 comments
balancap commented
Fixing min/max
ops scale propagation (PR #68 ) had the side effect of breaking MNIST training. Early investigation showing a divergence of the scale factors after a couple of iterations, similarly to an unstable dynamical system.
Todo: additional investigation to understand the dynamic of the issue.