xinyual's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
alexa/bort
Repository for the paper "Optimal Subarchitecture Extraction for BERT"