This is an educational repository, where I will attempt to explain through hands-on implementation and expiriments, as well as intuition:
- Activation functions and normalization/weight init: why do they work?
- Why different techniques allow machines to 'learn' patterns better/worse
- Misc. techniques for designing DL architectures
I will be going through various papers and trying to get a feel for why certain techniques where developed, and trying to get to the true source of why these techniques where invented.