Training & Testing in Higher Precision
Closed this issue · 2 comments
Hi, I wonder if this codebase is possible to train and test in a higher precision?
I'm interested in training in W-8, A-8, G-8, E-8, and also test in W-8, A-8, G-8, E-8. But I realized that simply changing this line in the source/Option.py
file into bitsW = 8 # bit width of we ights
doesn't work. It will lead to explosion of gradients. I wonder if you have any suggestion on it?
@stevenygd
Hi, I have tried this configuration and faced the exploded gradient as well, but I transferred to other field and now have little time to deep into it. It is a very promising idea.
The problem is that you should always set kG > kW, otherwise the small changes of weight accumulation will immediately affect the forward propagation. In experiment, the gradients are mostly amplified, in the paper, the gap between 2-bit W and 8-bit G offers a buffer space to smooth the noise and filter out robust weight updates. Maybe that why 2W8G works fine while 8W8G not.
So I suggest that you can begin with W-8 A-8 G-float32 E-8 and try some dynamic scaling of the gradients.
Good luck : )
Try gradient clipping, e.g. change this line to
xmax = max(gmax, tf.reduce_max(tf.abs(x)))
where gmax
is some reasonable upper limit on gradient magnitude (estimate it from W-2 network, and experiment from there).