Hello, in this repository I will be coding a neural network (architecture TBD) from scratch with only C and CUDA.
This is for educational purposes primarily, but hopefuly it can acheive some cool results, and really push towards the cutting edge of DL library speeds through high quality kernels.
- Write a Kernel that beats state-of-the-art for some DL task.
- Build NN from scratch with CUDA
- Understand CUDA landscape in modern DL and fully understand important/recent research