Unique Alphas - how many per layer

Question

Unique Alphas - how many per layer

wanderingweights opened this issue 2 years ago · 3 comments

Hi again,

Your model saves don't seem to have the self.nbit parameters saved in the model saves you have released, meaning reproducing your results is not possible.

Let me know if I'm missing something.

Answer 1 · 2023-04-28T16:23:28.000Z

Ah apologies, another issue is:

In your release of the code, it seems that there is an issue with the number of unique alphas.

I was expecting there to be one per attention head, but it seems that there is one per input channel:

576 in the first blocks.0.attn.qkv
etc.

Does this seem correct to you?

Because currently there are far too many alphas

Answer 2 · 2023-05-22T06:38:49.000Z

Ah apologies, another issue is:

In your release of the code, it seems that there is an issue with the number of unique alphas.

I was expecting there to be one per attention head, but it seems that there is one per input channel:

576 in the first blocks.0.attn.qkv etc.

Does this seem correct to you?

Because currently there are far too many alphas

“blocks.0.attn.qkv” actually is the linear layers to obtain query, key, and value. And for all linear or conv2d layers, we apply channel-wise activation quantization. The head-wise quantization is for quantizing query, key, and value (i.e. activations after the qkv linear layer).

Answer 3 · 2023-05-22T06:40:20.000Z

Hi again,

Your model saves don't seem to have the self.nbit parameters saved in the model saves you have released, meaning reproducing your results is not possible.

Let me know if I'm missing something.

To reproduce our results, there is no need to save the nbits in the ckpt. You can use the option "--model" to change the bit-width of the quantized model.