Performance optimization on RTX3060

Question

Performance optimization on RTX3060

xiaopeige opened this issue 4 years ago · 5 comments

I saw you test this project on GTX1070 and cuda10.1. My current environment is RTX3060 and CUDA11.0. Is there room for optimization in the registration module, such as parameter modification and full utilization of GPU, thank you!

Answer 1 · 2021-03-24T05:59:26.000Z

Thanks!
There is no optimization capability for a specific architecture.
If you have any suggestions, please let me know.

Answer 2 · 2021-03-26T01:56:44.000Z

Thanks!
There is no optimization capability for a specific architecture.
If you have any suggestions, please let me know.

Thanks！
So does this performance have nothing to do with the graphics card? For example, the performance of using GTX1070 and RTX3060 is the same (in terms of time-consuming)?

Answer 3 · 2021-03-26T02:08:00.000Z

The performance is different.
My point was that the performance is not optimized for different architectures.
For example, I don't use the Tensor core available in the Volta generation.

Answer 4 · 2021-03-26T02:12:49.000Z

Thanks！

I see that you are using the thrust library, so if use Tensor core, the performance of advanced graphics cards may be fully utilized, right?

Answer 5 · 2021-03-26T02:19:14.000Z

That's correct.
WMMA can be used to improve the performance of matrix multiplication.
https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/