IST Austria Distributed Algorithms and Systems Lab

Pinned Repositories

gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python2.2k 26 52182
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python901 16 3374
OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
Language:Python128 5 1015
PanzaMail
Language:Python294 7 619
qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
Language:Python277 7 523
Quartet
Language:Jupyter Notebook96 11 510
QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
Language:C++182 6 713
qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
Language:C++948
Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
Language:Cuda82 6 15
sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Language:Python837 15 36110

IST Austria Distributed Algorithms and Systems Lab doesn’t have any repository yet.