Difference performance with simple_mode enabled?
Opened this issue · 0 comments
LeiWang1999 commented
Hi all, could you kindly introduce the difference between auto-tensorize and auto-tensorize-v4 ? from the observation of amos-gemm benchmarking, the performance of this two strategies is quite resemblance
M | K | N | amos-1000-step-fp16-simple(ms) | amos-1000-step-fp16(ms) |
---|---|---|---|---|
2 | 2 | 2 | Failed to Run | Failed to Run |
4 | 4 | 4 | Failed to Run | Failed to Run |
8 | 8 | 8 | Failed to Run | Failed to Run |
16 | 16 | 16 | 0.004545906 | 0.003936828 |
32 | 32 | 32 | 0.004610093 | 0.004310548 |
64 | 64 | 64 | 0.004638971 | 0.004614832 |
128 | 128 | 128 | 0.005128772 | 0.005059945 |
256 | 256 | 256 | 0.006975747 | 0.007367229 |
512 | 512 | 512 | 0.018055338 | 0.016287096 |
1024 | 1024 | 1024 | 0.066839093 | 0.071785023 |
2048 | 2048 | 2048 | 0.382059749 | 0.336489417 |
4096 | 4096 | 4096 | 2.00519422 | 2.252330443 |
8192 | 8192 | 8192 | 21.62599663 | 18.10944683 |
16384 | 16384 | 16384 | 111.4660256 | 132.6751751 |