kssteven418/I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

PythonMIT

Issues

python run.py --arch roberta_base --task STS-B
#33 opened 2 months ago by Arthur-Ling
0
Can not inference the quantilized model in my device by int8
#15 opened 3 years ago by deepfind
1
About CUDA out of memory
#32 opened 3 months ago by Ai-ZL
0
Task name references in strings are wrong
#29 opened a year ago by AmosHason
1
Cannot perform quantization-aware-finetuning due to NaN values
#31 opened a year ago by AmosHason
0
Where can I find the integer-sqrt kernel ?
#26 opened 2 years ago by Alex-Songs
0
0
#25 opened 2 years ago by Arouniversal
0
About scaling_factor
#24 opened 2 years ago by DOHA-HWANG
0
Pre-trained weights for specific tasks
#23 opened 2 years ago by roymiles
0
IBert problems of quant_model=true
#18 opened 3 years ago by CSuperlei
1
Storing both float32 and int parameters
#22 opened 2 years ago by huu4ontocord
0
Latency 20x with quant_mode = true
#21 opened 2 years ago by LiamPKU
1
Arguments in run.py
#20 opened 2 years ago by ZhangYunchenY
0
Wrong script of downloading GLUE datasets
#19 opened 2 years ago by Alexiazzf
0
Another setting for quantization
#17 opened 3 years ago by gksruf
0
How can we change the quantization settings?
#16 opened 3 years ago by kentaroy47
0
why is Integer-only finetuning is much more slower than fp32 finetune
#14 opened 3 years ago by renmada
0
Is ibert-roberta-base on huggingface model hub the same as roberta-base
#13 opened 3 years ago by renmada
0
How is the scaling factor S implemented with integer?
#12 opened 3 years ago by deJQK
0
3rd order polynomial approximation to GeLU
#11 opened 3 years ago by CaoZhongZ
0
Bugs in the code
#10 opened 3 years ago by hsiehjackson
0
rationale considering in using floor or round
#9 opened 3 years ago by CaoZhongZ
1
(huggingface) The output of IBERT is float. Am I doing wrong?
#8 opened 3 years ago by kyoungrok0517
0
quantize other roberta model
#7 opened 3 years ago by longyueling
0
Quantization on trained model
#6 opened 3 years ago by shon-otmazgin
10
Missing deployment part on TensorRt
#2 opened 3 years ago by pommedeterresautee
3
Training in mixed precision
#3 opened 3 years ago by bdalal
3
Why use 22 bit quantized activations for some layer norms (except in Embeddings)?
#5 opened 3 years ago by bdalal
2
Possible bug in IntSoftmax
#4 opened 3 years ago by bdalal
0
Can use the CPU in the inference state?
#1 opened 3 years ago by luoling1993
1