fkodom/grouped-query-attention-pytorch
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints" (https://arxiv.org/pdf/2305.13245.pdf)
PythonMIT
Issues
- 0
function param support attn_mask ?
#7 opened by BucherLi - 1
gqa model runtime > no gqa model runtime
#4 opened by Adonai02 - 2
Query / key mixing correct?
#5 opened by ozppupbg - 2