Trusted-AI/adversarial-robustness-toolbox

Some question about computing the adversarial saliency map in JSMA attack

HIT1180300227 opened this issue · 3 comments

Hi,

When using JSMA method, I found that the implementation of adversarial saliency map of this toolbox is slightly different from the original paper:

In this toolbox, corresponding implementation in saliency_map.py looks like this:

  def _saliency_map(self, x: np.ndarray, target: Union[np.ndarray, int], search_space: np.ndarray) -> np.ndarray:
  
    grads = self.estimator.class_gradient(x, label=target)
    grads = np.reshape(grads, (-1, self._nb_features))

    # Remove gradients for already used features
    used_features = 1 - search_space
    coeff = 2 * int(self.theta > 0) - 1
    grads[used_features == 1] = -np.inf * coeff

    if self.theta > 0:
        ind = np.argpartition(grads, -2, axis=1)[:, -2:]**
    else:  # pragma: no cover
        ind = np.argpartition(-grads, -2, axis=1)[:, -2:]

    return ind

I notice that ind is selected directly from grads

But in original paper, adversarial saliency map is computed like this :

屏幕截图 2023-10-09 091213

or heuristic equation like this:

屏幕截图 2023-10-09 092527

I'm confused about this difference.

Hi @HIT1180300227 I think this implementation of JSMA is neglecting the additional terms on gradients towards classes other than the target class. Have you been able to use the attack successfully?

Hi @beat-buesser ,

I use JSMA method in ids(intrusion detection system) field.Specifically, I use the targeted JSMA method on the statistical feature vectors as follows:

art_classifier = KerasClassifier(model=model, use_logits=False)
attack = SaliencyMapMethod(classifier=art_classifier, theta=theta, gamma=gamma, batch_size=1,verbose=True)

#x_test are original statistical feature vectors 
targeted_x_test_jsma = attack.generate(x=x_test,y=numpy_targets)

Before using jsma attack,I can get 90% classification accuracy.After using this attack method, the classification accuracy will be reduced to 20%.

It seems that although the implementation of this attack method is not consistent with the original paper, it can still successfully confuse the classification model.

Why does the jsma attack still work?

Hi @HIT1180300227 I think it still works because the main component of the gradients is the same, e.g. the direction in which the current classes' logit value decreases. The paper is more accurate by requiring additional terms for updates to this direction to make sure the other logins are not increasing. It looks that for many applications these additional therms might be small/negligible, but it would be more complicated to implement.