jianyuheng

deep learning

Tencent

jianyuheng's Stars

mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python20.3k 180 1.5k1.7k
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Language:Python11.9k 114 8541.2k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++10k 113 2.3k1.3k
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
Language:Python9k 49 754985
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
Language:Python3.9k 57 147302
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
Language:Python3.8k 35 489290
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook2.5k 30 94174
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Language:Python1.6k 42 5123
Vahe1994/AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Language:Python1.2k 18 106182
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python790 17 8861
mobiusml/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
Language:Python772 16 12379
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language:Python681 18 2844
princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Language:Python598 23 7351
segmind/distill-sd
Segmind Distilled diffusion
Language:Python595 17 1738
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python496 8 4131
hao-ai-lab/Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
Language:Python388 8 1117
spcl/QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
Language:Python366 10 6235
Cornell-RelaxML/QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
Language:Python361 9 1533
Nota-NetsPresso/BK-SDM
A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
Language:Python285 10 4319
OpenGVLab/EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Language:Python261 4 2619
hemingkx/Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
Language:Python244 3 1730
MFaceTech/InstantID
Language:Python150 11 1528
taprosoft/llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes)
Language:Python147 3 623
jaymody/speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
Language:Python92 2 810
ruiqixu37/distill_diffusion
Implementation of the 2023 CVPR Award Candidate: On Distillation of Guided Diffusion Models
Language:Python48 5 43
LiqunMa/FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Language:Python47 4 12
Qualcomm-AI-research/lr-qat
Language:Python36 3 32
Skhaki18/optin-transformer-pruning
[ICLR 2024] The Need for Speed: Pruning Transformers with One Recipe
Language:Python26 4 52
MFaceTech/AIGC-SD-Acceleration
Language:Python25 3 13
onliwad101/FlexRound_LRQ
FlexRound (ICML 2023) & LRQ
Language:Python10

jianyuheng

jianyuheng's Stars

mlc-ai/mlc-llm

Lightning-AI/litgpt

NVIDIA/TensorRT-LLM

axolotl-ai-cloud/axolotl

microsoft/LMOps

turboderp/exllamav2

FasterDecoding/Medusa

horseee/Awesome-Efficient-LLM

Vahe1994/AQLM

OpenGVLab/OmniQuant

mobiusml/hqq

SqueezeAILab/SqueezeLLM

princeton-nlp/LLM-Shearing

segmind/distill-sd

mit-han-lab/qserve

hao-ai-lab/Consistency_LLM

spcl/QuaRot

Cornell-RelaxML/QuIP

Nota-NetsPresso/BK-SDM

OpenGVLab/EfficientQAT

hemingkx/Spec-Bench

MFaceTech/InstantID

taprosoft/llm_finetuning

jaymody/speculative-sampling

ruiqixu37/distill_diffusion

LiqunMa/FBI-LLM

Qualcomm-AI-research/lr-qat

Skhaki18/optin-transformer-pruning

MFaceTech/AIGC-SD-Acceleration

onliwad101/FlexRound_LRQ