ItsAbdula

abdula2523@ajou.ac.kr

Seoul, South Korea

ItsAbdula's Stars

IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python54342
xvyaward/owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".
Language:Python505
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
Language:Jupyter Notebook5.6k332
qwopqwop200/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
Language:Python3k457
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.3k459
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python1.9k150
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python2.3k172
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Language:Cuda39612
pytorch/ao
PyTorch native quantization and sparsity for training and inference
Language:Python66687
huggingface/optimum-quanto
A pytorch quantization backend for optimum
Language:Python74655
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1.2k132
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook27026
facebookresearch/sapiens
High-resolution models for human tasks.
Language:Python3.7k177
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python1.6k189
htqin/awesome-model-quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
1.8k203
casys-kaist/NeuPIMs
NeuPIMs Simulator
Language:Jupyter Notebook427
microsoft/ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Language:Python874
AlibabaPAI/FLASHNN
Language:Python667
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++12.5k1.5k
cuda-mode/lectures
Material for cuda-mode lectures
Language:Jupyter Notebook2.3k237
Kobzol/hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
Language:C++34126
srush/Triton-Puzzles
Puzzles for learning Triton
Language:Jupyter Notebook94759
microsoft/vidur
A large-scale simulation framework for LLM inference
Language:Python21622
Mozilla-Ocho/llamafile
Distribute and run LLMs with a single file.
Language:C++18.9k953
cuda-mode/resource-stream
CUDA related news and material links
1.1k66
microsoft/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
Language:C18010
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
Language:Python1079
HanGuo97/flute
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Language:Cuda1445
CisMine/Guide-NVIDIA-Tools
NVIDIA tools guide
Language:Cuda602
AnswerDotAI/gpu.cpp
A lightweight library for portable low-level GPU computation using WebGPU.
Language:C++3.6k171

ItsAbdula

ItsAbdula's Stars

IST-DASLab/marlin

xvyaward/owq

srush/GPU-Puzzles

qwopqwop200/GPTQ-for-LLaMa

AutoGPTQ/AutoGPTQ

IST-DASLab/gptq

mit-han-lab/llm-awq

efeslab/Nanoflow

pytorch/ao

huggingface/optimum-quanto

mit-han-lab/smoothquant

LLMServe/DistServe

facebookresearch/sapiens

casper-hansen/AutoAWQ

htqin/awesome-model-quantization

casys-kaist/NeuPIMs

microsoft/ParrotServe

AlibabaPAI/FLASHNN

triton-lang/triton

cuda-mode/lectures

Kobzol/hardware-effects-gpu

srush/Triton-Puzzles

microsoft/vidur

Mozilla-Ocho/llamafile

cuda-mode/resource-stream

microsoft/vattention

AlibabaPAI/llumnix

HanGuo97/flute

CisMine/Guide-NVIDIA-Tools

AnswerDotAI/gpu.cpp