Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
CudaApache-2.0
Issues
- 6
sgemm 与cublas求的结果不同
#13 opened by 952480831 - 0
The answer in reduce v0 is wrong in my 4060Ti 16G
#17 opened by wplf - 0
reduce的make报错
#16 opened by shifang99 - 0
- 0
arbitrary input size N support for reduce_v7
#14 opened by hygxy - 3
question on reduce
#12 opened by eric1hello - 4
请问 sgemm 的程序是哪里改动改错了吗?性能不大行?
#11 opened by hermosayhl - 2
最后一次小迭代时register数据预取
#10 opened by hwchen2017 - 3
Results of cublas and sgemm_v3.cu differ
#9 opened by chaoming0625 - 4
severe performance degradation
#8 opened by XG-zheng - 1
Reduce Makefile typo
#7 opened by XiaoSong9905 - 1
- 1
- 1
reduction优化技巧3
#2 opened by MGLiXu - 2
reduce_v7有问题
#1 opened by zhzq123