Pinned Repositories
GeDe
hyx1999.github.io
ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Lors
SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
cutlass
CUDA Templates for Linear Algebra Subroutines
QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
triton
Development repository for the Triton language and compiler
AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
hyx1999's Repositories
hyx1999/SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
hyx1999/GeDe
hyx1999/hyx1999.github.io
hyx1999/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
hyx1999/Lors