certik/fastGPT

Implement parallelization over heads

certik opened this issue · 0 comments

Currently the attention over heads runs in serial:

! Perform attention over each head

We should try to parallelize it and see if we get any speedups.