Issues
- 0
- 1
- 0
About CUDA out of memory
#32 opened by Ai-ZL - 1
Task name references in strings are wrong
#29 opened by AmosHason - 0
- 0
Where can I find the integer-sqrt kernel ?
#26 opened by Alex-Songs - 0
0
#25 opened by Arouniversal - 0
About scaling_factor
#24 opened by DOHA-HWANG - 0
Pre-trained weights for specific tasks
#23 opened by roymiles - 1
IBert problems of quant_model=true
#18 opened by CSuperlei - 0
Storing both float32 and int parameters
#22 opened by huu4ontocord - 1
Latency 20x with quant_mode = true
#21 opened by LiamPKU - 0
Arguments in run.py
#20 opened by ZhangYunchenY - 0
Wrong script of downloading GLUE datasets
#19 opened by Alexiazzf - 0
Another setting for quantization
#17 opened by gksruf - 0
How can we change the quantization settings?
#16 opened by kentaroy47 - 0
- 0
- 0
- 0
3rd order polynomial approximation to GeLU
#11 opened by CaoZhongZ - 0
Bugs in the code
#10 opened by hsiehjackson - 1
- 0
- 0
quantize other roberta model
#7 opened by longyueling - 10
Quantization on trained model
#6 opened by shon-otmazgin - 3
- 3
Training in mixed precision
#3 opened by bdalal - 2
Why use 22 bit quantized activations for some layer norms (except in Embeddings)?
#5 opened by bdalal - 0
Possible bug in IntSoftmax
#4 opened by bdalal - 1
Can use the CPU in the inference state?
#1 opened by luoling1993