Visualization of networks without FC layers
gudovskiy opened this issue · 3 comments
Hi, it seems that in your work it is possible to visualize only the networks with FC layers between last conv layer and softmax. At the same time, the original CAM paper learned the "importance" weights (actually the introduced FC layer) separately using SVM presuming average pooling in the end. Could you comment on how to visualize some popular networks without FC layers like MobileNet etc. using Grad-CAM?
Grad-CAM is applicable to any differentiable network that follows the convolutional layers. For networks like VGG/Alexnet, FC layers follow the convolutional layers. For networks like ResNets/Inception/Mobilenets there is just a Global Average Pooling (GAP) layer followed by a single FC layer. In such networks, the last FC layer weights learned will be the alphas (importance weights) that you would obtain if you were to do Grad-CAM (See Appendix- Page 11 or Grad-CAM paper: https://arxiv.org/pdf/1610.02391.pdf for proof). By that definition, Grad-CAM is applicable even for networks which have an RNN (Captioning), or have additional input channels like VQA.
The trick is to remove the softmax layer, and set the gradient to the last layer to be a one-hot encoding of the target class, and compute the gradient with respect to the last convolutional layer output. This gives you alphas. After this you can do a weighted sum with the forward feature maps, and clamp the negative values to obtain Grad-CAM visualization.
@ramprs in some networks like NiN or SqueezeNet output of pooling goes directly to softmax. In MobileNet, some people replace FC layer with Conv layer to save weight memory. Do you think Grad-CAM would work without FC layer in such architectures?
Yes, it should work.