HighSpeeds's Stars
facebookresearch/faiss
A library for efficient similarity search and clustering of dense vectors.
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
mistralai/mistral-inference
Official inference library for Mistral models
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
cvxgrp/cvxpylayers
Differentiable convex optimization layers
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Vahe1994/AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
acl-org/acl-style-files
Official style files for papers submitted to venues of the Association for Computational Linguistics
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
IST-DASLab/sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
locuslab/wanda
A simple and effective LLM pruning approach.
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
microsoft/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
cvxgrp/scs
Splitting Conic Solver
Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
Cornell-RelaxML/quip-sharp
subhadarship/kmeans_pytorch
kmeans using PyTorch
facebookresearch/dietgpu
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
yhhhli/APoT_Quantization
PyTorch implementation for the APoT quantization (ICLR 2020)
patrick-kidger/torchcubicspline
Interpolating natural cubic splines. Includes batching, GPU support, support for missing values, evaluating derivatives of the spline, and backpropagation.
yxli2123/LoftQ
uanu2002/JSQ
[ICML 2024] JSQ: Compressing Large Language Models by Joint Sparsification and Quantization
Cornell-RelaxML/qtip
yxli2123/LoSparse
csyhhu/L-DNQ
Codes for AAAI2019 paper: Deep Neural Network Quantization via Layer-Wise Optimization using Limited Training Data
quanta-fine-tuning/quanta
(NeurIPS 2024) QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
kssteven418/SqueezeLLM-gradients
roychowdhuryresearch/pyHFO
STE/MNI HFO detection, classification and visualization
lihuang258/LoRAP
[ICML 2024]: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models