about feature matrix which is the input of the last fc layer
Closed this issue · 4 comments
Sorry to bother you, I have a question about the fgvc part. Why the normalized feature_matrix and feature_matrix_hat need to multiply 100 before the fc layer?
Hi, thanks for your interest in our work. We use the trick following the implementation of WS-DAN. This trick is useful to improve the final performance while introducing no significant extra computations. There are some discussions (GuYuc/WS-DAN.PyTorch#1) on this detail in the WS-GAN repo.
Hi, thanks for your interest in our work. We use the trick following the implementation of WS-DAN. This trick is useful to improve the final performance while introducing no significant extra computations. There are some discussions (GuYuc/WS-DAN.PyTorch#1) on this detail in the WS-GAN repo.
Thanks for your reply and I have another question about this paper. If we want to quantify the effect of attention map and use it to optimize the learning process, why we need a counterfactual attention instead of no attention? I'm a little confused with the necessity of the counterfactual attention.
Interesting question! Since we are considering the attention-based models here, "no attention" actually can also be regarded as a type of counterfactual attention. It is the "uniform attention" compared in Table 4. We see the uniform counterfactual attention can also achieve comparable performance with the "random attention" used in most of our experiments.
Interesting question! Since we are considering the attention-based models here, "no attention" actually can also be regarded as a type of counterfactual attention. It is the "uniform attention" compared in Table 4. We see the uniform counterfactual attention can also achieve comparable performance with the "random attention" used in most of our experiments.
Thank you a lot! I just started my work on fine-grained classification and I think your work is very interesting. Maybe I will try this method on my dataset later. Thank you again!