xinyual

Opensearch, Machine Learning

Amazon OpensearchShanghai

xinyual's Stars

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python27k4k
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python1.6k196
databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
Language:Python10.8k1.2k
alexa/bort
Repository for the paper "Optimal Subarchitecture Extraction for BERT"
Language:Python47039