pku-liang/AMOS

Difference performance with simple_mode enabled?

Opened this issue · 0 comments

Hi all, could you kindly introduce the difference between auto-tensorize and auto-tensorize-v4 ? from the observation of amos-gemm benchmarking, the performance of this two strategies is quite resemblance

M K N amos-1000-step-fp16-simple(ms) amos-1000-step-fp16(ms)
2 2 2 Failed to Run Failed to Run
4 4 4 Failed to Run Failed to Run
8 8 8 Failed to Run Failed to Run
16 16 16 0.004545906 0.003936828
32 32 32 0.004610093 0.004310548
64 64 64 0.004638971 0.004614832
128 128 128 0.005128772 0.005059945
256 256 256 0.006975747 0.007367229
512 512 512 0.018055338 0.016287096
1024 1024 1024 0.066839093 0.071785023
2048 2048 2048 0.382059749 0.336489417
4096 4096 4096 2.00519422 2.252330443
8192 8192 8192 21.62599663 18.10944683
16384 16384 16384 111.4660256 132.6751751