MadryLab/mnist_challenge

tf.tf.Variable() seems cannot be replaced with tf.get_variable().

Closed this issue · 8 comments

I just replace the tf.Variable() in function _weight_variable and _bias_variable with tf.get_variable(). And it cannot train a robust network to resist CW attack. In contrast, I run the unchanged source code, it can train a robust network, I am really confused why it is? The following are the code I only changed. Please help.

@staticmethod
def _weight_variable(shape, name):
    initial = tf.initializers.truncated_normal(stddev=0.1)
    return tf.get_variable(shape=shape, name=name, initializer=initial)
@staticmethod
def _bias_variable(shape, name):
    initial = tf.constant(0.1, shape=shape)
    return tf.get_variable(name=name, initializer=initial)
dtsip commented

This is weird. Looks like an issue with tensorflow though, so I don't see how we can help here.

Later, I am thinking is adversarial training with PGD not necessarily robust to CW attack (I mean original CW attack, not PGD attack with CW loss function in your paper)? Is PGD-based adversarial trained network robust to CW attack just an accidental phenomenon under different initialization settings?

dtsip commented

The goal of PGD training is to solve the min-max problem stated in our paper. If it is successful, there is no attack that will degrade the accuracy of our model (be it standard CW or whatever variant). In fact, we have found that PGD training leads to models that can be provably robust (https://arxiv.org/abs/1809.03008).

dtsip commented

PGD does solve the min-max problem well enough. As I mentioned before, PGD trained networks have been shown to be provably robust to every adversarial attack within the threat model (https://arxiv.org/abs/1809.03008).

dtsip commented

Moreover, the secret network in our challenge is robust to the standard CW attack. And I wouldn't call this an accident.