IST Austria Distributed Algorithms and Systems Lab

Pinned Repositories

gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python2.2k 28 51182
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python896 16 3373
OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
Language:Python126 5 1015
PanzaMail
Language:Python294 7 519
qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
Language:Python277 6 523
Quartet
Language:Jupyter Notebook949
QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
Language:C++182 6 713
qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
Language:C++925
Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
Language:Cuda82 6 15
sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Language:Python833 15 33110

IST Austria Distributed Algorithms and Systems Lab's Repositories

IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python896 16 3373
IST-DASLab/sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Language:Python833 15 33110
IST-DASLab/PanzaMail
Language:Python294 7 519
IST-DASLab/Quartet
Language:Jupyter Notebook949
IST-DASLab/qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
Language:C++925
IST-DASLab/Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
Language:Cuda82 6 15
IST-DASLab/QuEST
Work in progress.
Language:Jupyter Notebook72 5 36
IST-DASLab/MoE-Quant
Code for data-aware compression of DeepSeek models
Language:Python507
IST-DASLab/FP-Quant
Language:Python353
IST-DASLab/EvoPress
Language:Python29 4 52
IST-DASLab/gptq-gguf-toolkit
DASLab support for GGUF
Language:Python221
IST-DASLab/HALO
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arxiv.org/abs/2501.02625
Language:Python20
IST-DASLab/gemm-fp8
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
Language:Cuda191
IST-DASLab/MicroAdam
This repository contains code for the MicroAdam paper.
Language:Python19 4 14
IST-DASLab/DarwinLM
Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"
Language:Python183
IST-DASLab/torch_cgx
Pytorch distributed backend extension with compression support
Language:C++16 4 50
IST-DASLab/peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
Language:Python15 7 14
IST-DASLab/gemm-int8
High Performance Int8 GEMM Kernels for SM80 and later GPUs.
Language:Python120
IST-DASLab/ISTA-DASLab-Optimizers
Language:Python9 5 20
IST-DASLab/LDAdam
LDAdam - Adaptive Optimization from Low-Dimensional Gradient Statistics
Language:Python7 3 02
IST-DASLab/influence_distillation
Official implementation of Influence Distillation: https://www.arxiv.org/abs/2505.19051
Language:Python3
IST-DASLab/GridSearcher
GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.
Language:Python2 4 0
IST-DASLab/marlin_artifact
Language:Python20
IST-DASLab/ScalableMNN
Official Repository for "Scalable Mechanistic Neural Networks" (ICLR 2025)
Language:Python2 5 0
IST-DASLab/SPADE
Code of SPADE: Sparsity Guided Debugging for Deep Neural Networks
Language:Jupyter Notebook1 5 03
IST-DASLab/AutoGPTQRoSA
Language:Python4 0
IST-DASLab/HALO-anon
IST-DASLab/LDAdam-anonymous
IST-DASLab/llm-foundry
LLM training code for Databricks foundation models
Language:Python0 0
IST-DASLab/Yolov8-Pose-Detection-on-Browser
Example of YOLOv8 pose detection (estimation) on browser. It shows implementations powered by ONNX and TFJS served through JavaScript without any frameworks. It demonstrates pose detection (estimation) on image as well as live web camera,
Language:HTML