A high-throughput and memory-efficient inference and serving engine for LLMs
A simple and effective LLM pruning approach.
JiaQuan1203 doesn’t have any repository yet.