Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
CudaApache-2.0
Stargazers
- 3outeilleHuggingFace
- AlexwellChenChalmer University of Technology
- alg-leonChina
- amadeuzou
- dingwoai
- DTennantShanghai
- feifeizuoBeijing
- fighterhit
- fly51flyPRIS
- ForrestPiXi’an Jiaotong University
- ikillery
- Jackyan1999BJ,CHINA
- Karbo123Shenzhen, China
- L1aoXingyuBeijing, China
- lartpangDUT
- lileilaiguangzhou
- lishicheng1996
- Lmy0217MUSIC Lab@SZU
- lu-ymSH
- LwangStat
- lzhnbGorilla-Lab
- nlp4whp
- s5u13b@AlibabaPAI
- starrkk
- Suke0
- thelastlinUCAS
- tkandiZhejiang University
- tpoisonooopjlab.org.cn
- tszFung-gzGuangZhou
- yanzixu
- YuxiangJohnJD.com
- Zhiwei35Intel
- zhsky2017
- zixuanweShanghai, China
- zx33
- zzilchShenzhen University