Raincleared-Song/sparse_gpu_operator

operator_23 slower than torch counterpart at low sparsity

Closed this issue · 2 comments

@Raincleared-Song
Hi, I ran run_test_23.py on A100 GPU, and found that torch_launch_ffn_fuse_23 is slower that vanilla torch one at low sparsity(lower than 30%).
issue

Is that resonable? What's the possible causes?
Thanks!

This is a normal situation. Compared with PyTorch APIs, our operators are tailored for sparse vector-matrix multiplication but are not good at dense operations. Concretely, we do not use some acceleration techniques of cuBLAS, which are adopted in PyTorch for dense tensor operations. Therefore, we may fall behind PyTorch in operations of relatively dense tensors.

@Raincleared-Song Got it, thank you~