initzhang/hydragen-attention
An implementation of the core attention algorithm in the paper "Hydragen: High-Throughput LLM Inference with Shared Prefixes".
Python
No issues in this repository yet.
An implementation of the core attention algorithm in the paper "Hydragen: High-Throughput LLM Inference with Shared Prefixes".
Python
No issues in this repository yet.