Question: Interpretation of cuda results

Question

Question: Interpretation of cuda results

Closed this issue 9 months ago · 6 comments

Tristan, I'm going to officially do a PR, but wanted to run this preview by you and get your take on how to interpret the results. This is from a machine with Core i9 and RTX4090. Does it look right? What do these results mean (specifically in the cuda column)? Thanks!

Average benchmark:

Operation	cpu	cuda
Argmax	9.36	0.06
BCE	26.26	25.90
Concat	30.27	0.06
Conv1d	61.74	0.17
Conv2d	26.07	0.09
LeakyReLU	6.11	0.04
Linear	103.52	0.07
MatMul	81.08	0.05
PReLU	4.86	0.05
ReLU	10.29	0.06
SeLU	7.34	0.06
Sigmoid	7.88	0.04
Softmax	21.22	0.04
Softplus	7.25	0.04
Sort	61.68	0.11
Sum	13.22	0.06
SumAll	8.50	0.06

Answer 1 · 2024-01-06T01:54:05.000Z

hi @alexziskind1, thanks for providing your results!
The results seem ok for me, the performance on GPU with cuda device is extremely fast compared to CPU, this explains the large difference between the two columns. I will upgrade tomorrow the benchmark to integrate cuda/cpu speedup! So that we can better understand the difference between cpu and gpu performance.

Your results are very interesting, thanks! I'll also provide Tesla V100 benchmarks very soon

Answer 2 · 2024-01-06T02:37:25.000Z

I also have a M3, M3 Pro, and M3 Max, M1 Pro and M2 Max I can add :)

Answer 3 · 2024-01-06T02:38:02.000Z

What i'd like to know, please, is what these numbers mean. I see that the cuda numbers are significantly different, but what are they? Thanks!

Answer 4 · 2024-01-06T09:25:09.000Z

So the idea is that each value represents the average runtime in milliseconds for executing all examples in each operation.

You can check the detailed benchmark here with all the detailed examples.

Answer 5 · 2024-01-06T09:26:51.000Z

So basically it takes 0,06ms in average to run the the Linear operations on your cuda GPU compared to 103ms on CPU

Answer 6 · 2024-01-06T09:27:52.000Z

It would be amazing if you can add your results to the benchmarks with these chips!!