hahnyuan/PTQ4ViT

Constrain the scaling factors of the two ranges

Shape-Kim opened this issue · 6 comments

First of all, thank you for the great work and the official code.

I have one question.

Where is the code implementation for constraining the scaling factors for post-softmax and post-gelu, i.e., ∆R2 = 2m∆R1, where m is an unsigned integer, in order for an efficient process.

I really appreciate for providing the code once again.

PTQ4ViT adopts two classes SoSMatMul and PostGeluLinear in quant_layers/. SoSMatMul adopts a variable split to quantize post-softmax values. It indicates the split point of two ranges [0, split] and [split, 1], and is initialized as 2**m (m < 0). PostGeluLinear adopts an additional scaling factor a_neg_interval for the negative range, and the positive range's candidate intervals are initailized to satisfy the constraint.

Thank you for your quick response!

For post-softmax, I can find that split points satisfy the constraints throughout the code.

However, I am still curious about the case of post-gelu. As I understand, a_interval might become (2^m) * a_neg_interval.

In the PostGeluPTQSLQuantLinear, I can find the code for initializing the candidates for input interval. However, this does not satisfy the constraint.

In order to check, I run the code for quantizing the ViT-S in 6/6-bit setting. For the first gelu layer, self.a_interval and self.a_neg_interval are 0.0053116 and 0.0289, respectively. Therefore, self.a_interval / self.a_neg_interval is 34.6432 which is not satisfying the constraint.

I am thinking that I misunderstand something about the code. Please answer my question.

Thank you for your quick answer again.

A_interval_candidates should be initialized with 2 ** m * initial a_neg_interval. Looks like a mistake pop up when merging the released version code, original code for experiment is kind of messy.
Sorry for that if it bothers you. You can correct this if you find that troublesome.
I might not have the chance for validating the results recently, since the server with the original experiment setup is constantly held up by other people these days, and I’m catching up another conference’s deadline.
Really sorry for that once again, and thank you for pointing out. I’ll manage hotfix later.

Thank you for the reply again!

I hope this problem can be fixed, and I'll cross my fingers for your current work!

Thank you for the reply again!

I hope this problem can be fixed, and I'll cross my fingers for your current work!

The implementation here seems to be hard-coded? Do you have a solution?
Please, Have you verified the modified results? Can you provide reference results? Thank you!

self.a_interval.append(0.16997124254703522/self.a_qmax)

Hi @SuperVan-Young , the scale of the negative value area to this? Is it the result of parameter adjustment or random?