/Model-Compression-Techniques

Compress Transformers for faster inference using techniques like Knowledge Distillation, Quantization, ONNX Conversion and Pruning (Sparsification)

Primary LanguageJupyter Notebook

Model-Compression-Techniques

Compress Transformers for faster inference using techniques like Knowledge Distillation, Quantization, ONNX Conversion and Pruning (Sparsification)