NetEase-FuXi's Stars
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
huggingface/text-generation-inference
Large Language Model Text Generation Inference
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
opennaslab/kubespider
A global resource download orchestration system, build your home download center.
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
NetEase-FuXi/EET
Easy and Efficient Transformer : Scalable Inference Solution For Large NLP model
NetEase-FuXi/EETQ
Easy and Efficient Quantization for Transformers