Pinned Repositories
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
PanzaMail
qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
Quartet
QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
IST Austria Distributed Algorithms and Systems Lab's Repositories
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
IST-DASLab/sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
IST-DASLab/PanzaMail
IST-DASLab/Quartet
IST-DASLab/qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
IST-DASLab/Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
IST-DASLab/QuEST
Work in progress.
IST-DASLab/MoE-Quant
Code for data-aware compression of DeepSeek models
IST-DASLab/FP-Quant
IST-DASLab/EvoPress
IST-DASLab/gptq-gguf-toolkit
DASLab support for GGUF
IST-DASLab/HALO
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arxiv.org/abs/2501.02625
IST-DASLab/gemm-fp8
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
IST-DASLab/MicroAdam
This repository contains code for the MicroAdam paper.
IST-DASLab/DarwinLM
Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"
IST-DASLab/torch_cgx
Pytorch distributed backend extension with compression support
IST-DASLab/peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
IST-DASLab/gemm-int8
High Performance Int8 GEMM Kernels for SM80 and later GPUs.
IST-DASLab/ISTA-DASLab-Optimizers
IST-DASLab/LDAdam
LDAdam - Adaptive Optimization from Low-Dimensional Gradient Statistics
IST-DASLab/influence_distillation
Official implementation of Influence Distillation: https://www.arxiv.org/abs/2505.19051
IST-DASLab/GridSearcher
GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.
IST-DASLab/marlin_artifact
IST-DASLab/ScalableMNN
Official Repository for "Scalable Mechanistic Neural Networks" (ICLR 2025)
IST-DASLab/SPADE
Code of SPADE: Sparsity Guided Debugging for Deep Neural Networks
IST-DASLab/AutoGPTQRoSA
IST-DASLab/HALO-anon
IST-DASLab/LDAdam-anonymous
IST-DASLab/llm-foundry
LLM training code for Databricks foundation models
IST-DASLab/Yolov8-Pose-Detection-on-Browser
Example of YOLOv8 pose detection (estimation) on browser. It shows implementations powered by ONNX and TFJS served through JavaScript without any frameworks. It demonstrates pose detection (estimation) on image as well as live web camera,