Trusted-AI/adversarial-robustness-toolbox

Auto PGD not working with DLR loss for binary classification

neeleshverma opened this issue · 1 comments

Describe the bug
I am trying to do an Auto PGD attack on a binary classification network. The network's last layer has 2 output neurons (I could have done with a single neuron, but many attacks are expecting >= 2 output neurons). I tried using Auto PGD attack. It works for loss_type=cross_entropy but doesn't work for loss_type=difference_logits_ratio. I was looking at the code and found that this line (line 301 in auto_projected_gradient_descent.py) z_3 = y_pred[:, i_y_pred_arg[:, -3]] is causing all the trouble. The -3 argument means that there should be more than 2 classes. Does AutoPGD with DLR not work for binary classification then?

To Reproduce
Steps to reproduce the behavior:
Simple binary class classification (with 2 neurons at the end).

Expected behavior
Since loss_type=cross_entropy was working, was expecting it to work for loss_type=difference_logits_ratio too.

Screenshots
If applicable, add screenshots to help explain your problem.

System information (please complete the following information):

  • OS - Ubuntu 20.04.4 LTS
  • Python version - 3.9.18
  • ART version or commit number - 1.17.1
  • PyTorch

I think I got the issue - DLR loss needs at least 3 classes.