Compress Transformers for faster inference using techniques like Knowledge Distillation, Quantization, ONNX Conversion and Pruning (Sparsification)
subhasisj/Model-Compression-Techniques
Compress Transformers for faster inference using techniques like Knowledge Distillation, Quantization, ONNX Conversion and Pruning (Sparsification)
Jupyter Notebook