qwopqwop200/GPTQ-for-LLaMa

question about the zero_point

irasin opened this issue · 0 comments

If we set bit = 4 an sym = True

if self.sym:
    self.zero = torch.full_like(self.scale, (self.maxq + 1) / 2) # maxq = 2 ** 4 - 1 = 15

then self.zero will be a tensor and the value are all 8 in the tensor

I wonder why we here need to use zero -= 1, since 8 can be saved as 1000 in binary. What is the reason of using zero -= 1 and saving 7 as 0111 here?

zeros -= 1
zeros = zeros.numpy().astype(np.uint32)
qzeros = np.zeros((zeros.shape[0], zeros.shape[1] // 32 * self.bits), dtype=np.uint32)
i = 0
col = 0
while col < qzeros.shape[1]:
    if self.bits in [2, 4, 8]:
        for j in range(i, i + (32 // self.bits)):
            qzeros[:, col] |= zeros[:, j] << (self.bits * (j - i))
        i += 32 // self.bits
        col += 1
    else:
        raise NotImplementedError("Only 2,4,8 bits are supported.")

qzeros = qzeros.astype(np.int32)
self.qzeros = torch.from_numpy(qzeros)

Any comment is welcome, thanks.