number of thread | time |
---|---|
0 | 3773.15 |
2 | 1912.93 |
4 | 1050.26 |
8 | 758.06 |
number of thread | time |
---|---|
0 | 2649.02 |
0+AVX | 213.42 |
2 | 1395.93 |
4 | 770.47 |
8 | 695.93 |
8+AVX | 48.42 |
All experiments were done under same conditions(target = 'llvm', bn=32, etc)
Method | Numpy Time | TVM Time |
---|---|---|
baseline | 0.008193 | 1.987586 |
blocking | 0.008379 | 0.218961 |
vectorize | 0.008022 | 0.237825 |
loop permute | 0.008263 | 0.103705 |
packing | 0.008152 | 0.104551 |
write cache | 0.008472 | 0.099767 |
parallel | 0.008281 | 0.032557 |
auto-tvm | 0.007933 | 0.007097 |