In summer 18' I undertook a 5 Credit points seminar at University of Applied Sciences Wedel (Germany) about "Optimization in Deep Learning" resulting in a 3.7 (GPA) graded paper, which can be found in this repository.
I decided to focus on Optimizer (e.g. SGD, Momentum, RMSProp, ADAM), Batch Normalization and Feature Optimization (i.e. PCA) as the most fundamental representatives when dealing with Deep Learning. I provide the reader with a general overview of those techniques including tons of self-generated examples and explaining graphs, but also face the most important mathematical details, because that's the only way to reach true understanding.