TristanBilot/mlx-benchmark

Question: Interpretation of cuda results

Closed this issue ยท 6 comments

Tristan, I'm going to officially do a PR, but wanted to run this preview by you and get your take on how to interpret the results. This is from a machine with Core i9 and RTX4090. Does it look right? What do these results mean (specifically in the cuda column)? Thanks!

Average benchmark:

Operation cpu cuda
Argmax 9.36 0.06
BCE 26.26 25.90
Concat 30.27 0.06
Conv1d 61.74 0.17
Conv2d 26.07 0.09
LeakyReLU 6.11 0.04
Linear 103.52 0.07
MatMul 81.08 0.05
PReLU 4.86 0.05
ReLU 10.29 0.06
SeLU 7.34 0.06
Sigmoid 7.88 0.04
Softmax 21.22 0.04
Softplus 7.25 0.04
Sort 61.68 0.11
Sum 13.22 0.06
SumAll 8.50 0.06

hi @alexziskind1, thanks for providing your results!
The results seem ok for me, the performance on GPU with cuda device is extremely fast compared to CPU, this explains the large difference between the two columns. I will upgrade tomorrow the benchmark to integrate cuda/cpu speedup! So that we can better understand the difference between cpu and gpu performance.

Your results are very interesting, thanks! I'll also provide Tesla V100 benchmarks very soon

I also have a M3, M3 Pro, and M3 Max, M1 Pro and M2 Max I can add :)

What i'd like to know, please, is what these numbers mean. I see that the cuda numbers are significantly different, but what are they? Thanks!

So the idea is that each value represents the average runtime in milliseconds for executing all examples in each operation.

You can check the detailed benchmark here with all the detailed examples.

So basically it takes 0,06ms in average to run the the Linear operations on your cuda GPU compared to 103ms on CPU

It would be amazing if you can add your results to the benchmarks with these chips!!