/hydragen-attention

An implementation of the core attention algorithm in the paper "Hydragen: High-Throughput LLM Inference with Shared Prefixes".

Primary LanguagePython

No issues in this repository yet.