Qubitium

Golang, Python, Kotlin. GPTQModel maintainer and OSS contributor to SGLang, vLLM, and others. @ModelCloudAi founder

ModelCloud.aiEarth/Epoch 2.0

Pinned Repositories

Device-SMI
Self-contained Python lib with zero-dependencies that give you a unified device properties for gpu, cpu, and npu. No more calling separate tools such as nvidia-smi or /proc/cpuinfo and parsing it yourself.
Language:Python12 1 92
GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Language:Python867 4 279126
LogBar
A unified Logger and ProgressBar util with zero dependencies.
Language:Python7 1 20
Tokenicer
A (nicer) tokenizer you want to use for model inference and training: with all known peventable gotchas normalized or auto-fixed.
Language:Python8 1 02
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python2 0 00
femtozip
Language:Shell1 0 00
list
Do you want a 9 KB cross-browser native JavaScript that makes your plain HTML lists super flexible, searchable, sortable and filterable? Yeah! Do you also want the possibility to add, edit and remove items by dead simple templating? Hell yeah!
Language:JavaScript1 1 00
php-cityhash
PHP extension for google's ultrafast cityhash library.
1 1 00
sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python20.1k 116 3.7k3.3k
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python62.5k 452 11.8k11.1k

Qubitium's Repositories

Qubitium/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python2 0 00
Qubitium/alpaca-lora
Instruct-tune LLaMA on consumer hardware
Language:Jupyter Notebook0 0 00
Qubitium/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0 00
Qubitium/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda0 0 00
Qubitium/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
Language:Python0 0 00
Qubitium/lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
Language:Python0 0 00
Qubitium/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Language:Python0 0 01
Qubitium/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python0 0
Qubitium/auto-round
SOTA Weight-only Quantization Algorithm for LLMs
Language:Python0 0
Qubitium/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python0 0
Qubitium/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Language:Python0 01
Qubitium/clod-code
rot13 version of claw code
Language:Grammatical Framework0 0
Qubitium/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Language:Python0 0
Qubitium/duskpilot-c3-clone
Qubitium/ethos-paper
Language:Jupyter Notebook0 0
Qubitium/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
Language:Python0 0
Qubitium/GPTQ-for-LLaMa
4 bits quantization of LLaMa using GPTQ
Language:Python0 0
Qubitium/GPTQ-triton
GPTQ inference Triton kernel
Language:Jupyter Notebook0 0
Qubitium/GPTQModel
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Qubitium/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
Language:Python0 0
Qubitium/hyperDB
A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $35M cap.
Language:Python0 0
Qubitium/llama.cpp
Port of Facebook's LLaMA model in C/C++
Language:C0 0
Qubitium/mav
model activation visualiser
Language:Python0 0
Qubitium/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python0 0
Qubitium/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
Language:Jupyter Notebook0 0
Qubitium/QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
Language:Python0 0
Qubitium/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Qubitium/transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Language:Python0 0
Qubitium/unsloth
5X faster 60% less memory QLoRA finetuning
Language:Python0 0
Qubitium/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0