kssteven418/I-BERT

Another setting for quantization

Opened this issue · 0 comments

Thanks for the great work.

It uses 32 integer points for activation and softmax.

However, the self-attention result cannot exceed 26 bits (8 bits x 8 bits x 10 bits (768 channels)).

I want to try the result with 16-bit precision (quantized with 16-bit and softmax and GeRU algorithms).
Is 16 bit any problem?
If not, I want a way to implement this.