Question: Interpretation of cuda results
Closed this issue ยท 6 comments
Tristan, I'm going to officially do a PR, but wanted to run this preview by you and get your take on how to interpret the results. This is from a machine with Core i9 and RTX4090. Does it look right? What do these results mean (specifically in the cuda column)? Thanks!
Average benchmark:
Operation | cpu | cuda |
---|---|---|
Argmax | 9.36 | 0.06 |
BCE | 26.26 | 25.90 |
Concat | 30.27 | 0.06 |
Conv1d | 61.74 | 0.17 |
Conv2d | 26.07 | 0.09 |
LeakyReLU | 6.11 | 0.04 |
Linear | 103.52 | 0.07 |
MatMul | 81.08 | 0.05 |
PReLU | 4.86 | 0.05 |
ReLU | 10.29 | 0.06 |
SeLU | 7.34 | 0.06 |
Sigmoid | 7.88 | 0.04 |
Softmax | 21.22 | 0.04 |
Softplus | 7.25 | 0.04 |
Sort | 61.68 | 0.11 |
Sum | 13.22 | 0.06 |
SumAll | 8.50 | 0.06 |
hi @alexziskind1, thanks for providing your results!
The results seem ok for me, the performance on GPU with cuda device is extremely fast compared to CPU, this explains the large difference between the two columns. I will upgrade tomorrow the benchmark to integrate cuda/cpu speedup! So that we can better understand the difference between cpu and gpu performance.
Your results are very interesting, thanks! I'll also provide Tesla V100 benchmarks very soon
I also have a M3, M3 Pro, and M3 Max, M1 Pro and M2 Max I can add :)
What i'd like to know, please, is what these numbers mean. I see that the cuda numbers are significantly different, but what are they? Thanks!
So the idea is that each value represents the average runtime in milliseconds for executing all examples in each operation.
You can check the detailed benchmark here with all the detailed examples.
So basically it takes 0,06ms in average to run the the Linear operations on your cuda GPU compared to 103ms on CPU
It would be amazing if you can add your results to the benchmarks with these chips!!