operator_23 slower than torch counterpart at low sparsity
Closed this issue · 2 comments
Jimskns commented
@Raincleared-Song
Hi, I ran run_test_23.py on A100 GPU, and found that torch_launch_ffn_fuse_23 is slower that vanilla torch one at low sparsity(lower than 30%).
Is that resonable? What's the possible causes?
Thanks!
Raincleared-Song commented
This is a normal situation. Compared with PyTorch APIs, our operators are tailored for sparse vector-matrix multiplication but are not good at dense operations. Concretely, we do not use some acceleration techniques of cuBLAS, which are adopted in PyTorch for dense tensor operations. Therefore, we may fall behind PyTorch in operations of relatively dense tensors.
Jimskns commented
@Raincleared-Song Got it, thank you~