vokkko

contributions to the community.

Anhui China

Pinned Repositories

TFMQ-DM
[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
Language:Jupyter Notebook58 8 94
MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Language:Jupyter Notebook7.2k 77 220461
MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.9k 106 612905
ai-by-hand-excel
00
auto-round
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Language:Python0 0 00
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Language:Python0 0 00
bilivideos
Language:Jupyter Notebook0 0 00
EfficientDM
[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models"
Language:Jupyter Notebook0 0 00
llmc
This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language:Python0 0 00
Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Language:Cuda0 0 00

vokkko/auto-round
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Language:Python0 0 00
vokkko/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Language:Python0 0 00
vokkko/bilivideos
Language:Jupyter Notebook0 0 00
vokkko/EfficientDM
[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models"
Language:Jupyter Notebook0 0 00
vokkko/llmc
This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language:Python0 0 00
vokkko/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Language:Cuda0 0 00