boluoweifenda/WAGE

some questions about quantization and shift

Opened this issue · 1 comments

Thank you for your amazing work,. I have carefully read it but still have some questions.
(1) in section 3.3.2, I understand " batch normalization degenerates into to a scaling layer", becaues "we hypothesize that batch outputs of each hidden layer approximately have zero-mean".
BUT another thing is that, BN is (X-\mu) / \sigma, why the scaling factor \alpha can represent the variance ?

(2) in section 3.3.3, how do you obtain [-sqrt(2), sqrt(2)] ??
I can unsterstand the function of shift( · ) and the middle figure in Fig.2,
for example in fig2: max(abs(e)) ≈ 1e-4, and 1e-8/shift(1e-4) = 8.912e-5, and 1e-4/shift(1e-4) = 0.8192. Therefore the peak is shifted. but how do you get [-sqrt(2), sqrt(2)] ??

(3) in section 4.3 and figure 4 how do you calculate the lower bound 8bit and 16 bit ?
you said "Upper boundaries are the max{|e|}" so for both case it is same. but for lower bound ,
2^(1-8) = 0.0078125 and 2^(1-16) = 3.0517578125e-5.
both lower_bound = Q(e_min, k_w) and lower_bound = Q(e_min/shift(max(|e|)), k_w) are not make sense.

for example e_max ≈ 1e-3 and e_min ≈ 1e-8 in fig4, e_min < σ(16) so lower_bound = Q(e_min, k_w) is not correct, and, e_min/shift(max(|e|)) = 1.024e-5, which is also < \sigma(16)
and lower_bound = Q(e_min/shift(max(|e|)), k_w) is not make sense.

(4) in Eq.11 what if g_s is large and \Delta W exceed the range [-1+\sigma, 1-\sigma] , do you mean you solve this situation together in Eq.12 ?

I hope I can get your help , thank you again !

Just follow the question above. Do you have any answers right now @Dawn-LX ?

In addition, i have some new questions:

  • the quantization of activation is not consistent regards the code implementation and the theoretic proof in the paper. The shift operation is performed in the weight quantization, rather than in the activation quantization.
  • in my understanding, all the computations are performed over data types like int8. However, the code implies that all the variables are encoded using float, as the quantization actually outputs fractional numbers.
    tf.round(x * SCALE) / SCALE
    in my opinion, tf.round(x * SCALE) is exactly the fixed-point representation of the input, with s-bit precision, s corresponds to the SCALE.

Hope to receive your answer @boluoweifenda . Thanks a lot anyway.