operator_23 slower than torch counterpart at low sparsity

Question

operator_23 slower than torch counterpart at low sparsity

Closed this issue 7 months ago · 2 comments

@Raincleared-Song
Hi, I ran run_test_23.py on A100 GPU， and found that torch_launch_ffn_fuse_23 is slower that vanilla torch one at low sparsity(lower than 30%).

Is that resonable? What's the possible causes?
Thanks!

Answer 1 · 2024-03-01T13:43:14.000Z

This is a normal situation. Compared with PyTorch APIs, our operators are tailored for sparse vector-matrix multiplication but are not good at dense operations. Concretely, we do not use some acceleration techniques of cuBLAS, which are adopted in PyTorch for dense tensor operations. Therefore, we may fall behind PyTorch in operations of relatively dense tensors.

Answer 2 · 2024-03-04T06:42:20.000Z

@Raincleared-Song Got it, thank you~