Difference performance with simple_mode enabled?

Question

Difference performance with simple_mode enabled?

Opened this issue 2 years ago · 0 comments

LeiWang1999 commented 2 years ago

Hi all, could you kindly introduce the difference between auto-tensorize and auto-tensorize-v4 ? from the observation of amos-gemm benchmarking, the performance of this two strategies is quite resemblance

M	K	N	amos-1000-step-fp16-simple(ms)	amos-1000-step-fp16(ms)
2	2	2	Failed to Run	Failed to Run
4	4	4	Failed to Run	Failed to Run
8	8	8	Failed to Run	Failed to Run
16	16	16	0.004545906	0.003936828
32	32	32	0.004610093	0.004310548
64	64	64	0.004638971	0.004614832
128	128	128	0.005128772	0.005059945
256	256	256	0.006975747	0.007367229
512	512	512	0.018055338	0.016287096
1024	1024	1024	0.066839093	0.071785023
2048	2048	2048	0.382059749	0.336489417
4096	4096	4096	2.00519422	2.252330443
8192	8192	8192	21.62599663	18.10944683
16384	16384	16384	111.4660256	132.6751751