ptillet/torch-blocksparse

Paper and code

Closed this issue · 2 comments

Sparse MultiHead Attention (https://arxiv.org/abs/1904.10509) - it is in deepspeedsparseselfattention.py or attention.py

I think both work. They just have a different API. The attention.py file has a torch-like interface, but the deepspeedsparseselfattention.py was contributed by Microsoft for compatibility with the DeepSpeed software. I suspect the latter will be deprecated when it gets merged in the deepspeed repo. Hope this answers your question!

ty!