HighSpeeds

lawrencerliu@ucla.edu

HighSpeeds's Stars

facebookresearch/faiss
A library for efficient similarity search and clustering of dense vectors.
Language:C++34k 481 2.6k3.8k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python11.6k 69 114730
mistralai/mistral-inference
Official inference library for Mistral models
Language:Jupyter Notebook10.1k 129 150908
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python8.4k 39 1.3k2.3k
quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Language:Python2.3k 49 1.4k398
cvxgrp/cvxpylayers
Differentiable convex optimization layers
Language:Python1.9k 57 116169
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language:Python1.8k 43 44160
Vahe1994/AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Language:Python1.2k 18 106182
acl-org/acl-style-files
Official style files for papers submitted to venues of the Association for Computational Linguistics
Language:TeX996 8 33216
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python789 17 8860
IST-DASLab/sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Language:Python776 15 33102
locuslab/wanda
A simple and effective LLM pruning approach.
Language:Python727 8 69101
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
Language:Cuda668 4 1693
microsoft/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Language:Cuda620 18 5642
cvxgrp/scs
Splitting Conic Solver
Language:C568 34 185138
Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
564 16 247
Cornell-RelaxML/quip-sharp
Language:Python529 13 6845
subhadarship/kmeans_pytorch
kmeans using PyTorch
Language:Jupyter Notebook509 8 3782
facebookresearch/dietgpu
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
Language:Cuda322 18 528
yhhhli/APoT_Quantization
PyTorch implementation for the APoT quantization (ICLR 2020)
Language:Python271 4 2651
patrick-kidger/torchcubicspline
Interpolating natural cubic splines. Includes batching, GPU support, support for missing values, evaluating derivatives of the spline, and backpropagation.
Language:Python245 3 1321
yxli2123/LoftQ
Language:Python220 4 3820
uanu2002/JSQ
[ICML 2024] JSQ: Compressing Large Language Models by Joint Sparsification and Quantization
Language:Python148 6 45
Cornell-RelaxML/qtip
Language:Python112 11 1513
yxli2123/LoSparse
Language:Python50 1 66
csyhhu/L-DNQ
Codes for AAAI2019 paper: Deep Neural Network Quantization via Layer-Wise Optimization using Limited Training Data
Language:Python41 4 08
quanta-fine-tuning/quanta
(NeurIPS 2024) QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
Language:Python23 0 14
kssteven418/SqueezeLLM-gradients
Language:Python18 2 47
roychowdhuryresearch/pyHFO
STE/MNI HFO detection, classification and visualization
Language:Python8 4 31
lihuang258/LoRAP
[ICML 2024]: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Language:Python5

HighSpeeds

HighSpeeds's Stars

facebookresearch/faiss

microsoft/LoRA

mistralai/mistral-inference

EleutherAI/lm-evaluation-harness

quic/aimet

cvxgrp/cvxpylayers

kyegomez/BitNet

Vahe1994/AQLM

acl-org/acl-style-files

OpenGVLab/OmniQuant

IST-DASLab/sparsegpt

locuslab/wanda

siboehm/SGEMM_CUDA

microsoft/VPTQ

cvxgrp/scs

Zhen-Dong/Awesome-Quantization-Papers

Cornell-RelaxML/quip-sharp

subhadarship/kmeans_pytorch

facebookresearch/dietgpu

yhhhli/APoT_Quantization

patrick-kidger/torchcubicspline

yxli2123/LoftQ

uanu2002/JSQ

Cornell-RelaxML/qtip

yxli2123/LoSparse

csyhhu/L-DNQ

quanta-fine-tuning/quanta

kssteven418/SqueezeLLM-gradients

roychowdhuryresearch/pyHFO

lihuang258/LoRAP