GoogLeNet implemented with pytorch
- Used MLP instead of conv layers to extract features
- Conv is a linear operation but MLPs are non-linear.
- So MLP is a better feature extractor than Conv
- MLP is the same as 1x1 conv
- Reduces the feature map channel size
- Reduces computation
- allows the network to see different field of view (Receptive field)
- 1x1 conv helps 3x3 and 5x5 reduce computations by reducing the channel size.
- To address the vanishing gradient problem
- Attached on Inception4a and Inception4d
- Only attatched when training!!! <- becareful when implementing the code!
- Total loss = main_loss + 0.3 * (aux1_loss + aux2_loss)