kssteven418/I-BERT

why is Integer-only finetuning is much more slower than fp32 finetune

renmada opened this issue · 0 comments

Compare with fp32 finetuning , It takes about 10x more time to inference dev data during training when do Integer-only finetune to Integer-only finetuning.
How can I do INT8 inference and achieve the seepup as described in paper?