Paper and code

Question

Paper and code

Closed this issue 4 years ago · 2 comments

Sparse MultiHead Attention (https://arxiv.org/abs/1904.10509) - it is in deepspeedsparseselfattention.py or attention.py

king-menin commented 4 years ago

ty!

Answer 1 · 2020-09-02T21:02:40.000Z

I think both work. They just have a different API. The attention.py file has a torch-like interface, but the deepspeedsparseselfattention.py was contributed by Microsoft for compatibility with the DeepSpeed software. I suspect the latter will be deprecated when it gets merged in the deepspeed repo. Hope this answers your question!