直接计算conv2d的cuda算子实现
详细介绍参考:https://zhuanlan.zhihu.com/p/613538649
需要确保本机有cuda和cudnn
nvcc main.cu -o test -lcudnn
如果找不到cudnn请手动指定目录
报告生成:nsys profile --stats=true -o report_conv ./test
ncu生成:sudo LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/cuda-11.7/lib64 /path/to/cuda-11.7/bin/ncu test
N/batch_size | inC | inH | inW | outC | outH | outW | kernelH | kernelW | cudnn | v1_conv | speedup |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 768 | 512 | 3 | 3 | 3 | 0.106353 | 0.119194 | 0.892268067 | ||
1 | 3 | 840 | 1200 | 3 | 3 | 3 | 0.29952 | 0.282706 | 1.059475215 | ||
1 | 3 | 960 | 1440 | 3 | 3 | 3 | 0.407859 | 0.406088 | 1.004361124 | ||
1 | 3 | 1200 | 1680 | 3 | 3 | 3 | 0.593111 | 0.547789 | 1.082736236 | ||
1 | 3 | 1440 | 1920 | 3 | 3 | 3 | 0.7454 | 0.745237 | 1.000218722 | ||
1 | 3 | 1920 | 2400 | 3 | 3 | 3 | 1.349059 | 1.243177 | 1.085170495 | ||
1 | 3 | 768 | 512 | 3 | 6 | 6 | 0.256932 | 0.160492 | 1.600902226 | ||
1 | 3 | 840 | 1200 | 3 | 6 | 6 | 0.667402 | 0.378142 | 1.764950733 | ||
1 | 3 | 960 | 1440 | 3 | 6 | 6 | 0.808141 | 0.448737 | 1.800923481 | ||
1 | 3 | 1200 | 1680 | 3 | 6 | 6 | 1.326254 | 0.742543 | 1.786097236 | ||
1 | 3 | 1440 | 1920 | 3 | 6 | 6 | 1.771069 | 0.970639 | 1.824642323 | ||
1 | 3 | 1920 | 2400 | 3 | 6 | 6 | 2.904648 | 1.582961 | 1.834946028 |