fkodom/grouped-query-attention-pytorch

(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints" (https://arxiv.org/pdf/2305.13245.pdf)

PythonMIT

Issues

function param support attn_mask ?
#7 opened 3 months ago by BucherLi
0
gqa model runtime > no gqa model runtime
#4 opened 9 months ago by Adonai02
1
Query / key mixing correct?
#5 opened 5 months ago by ozppupbg
2
does anybody try to uptrain an exsiting model to obtain a gqa-version model?
#3 opened a year ago by hxs91
2