Implement parallelization over heads

certik opened this issue 2 years ago · 0 comments

certik commented 2 years ago

Currently the attention over heads runs in serial:

fastGPT/gpt2.f90

Line 101 in 01eb84b

! Perform attention over each head

We should try to parallelize it and see if we get any speedups.