This repository demonstrates a newly observed phenomenon in deep learning:
Adding an easy example to the training set tends to cause the retrained model to become less confident on it.
- There exists a concrete phenomenon within finite-width neural network learning dynamics which cannot be recapitulated by NTK learning. (Related work Allen-Zhu & Li (2020))
- Influence functions cannot even approximate leave-one-out retraining, let alone leave-many-out (because the Hessian is positive-semidefinite). (Related work Bae et al. (2022))
- Neural network learning does not minimize loss. (This is new)
To demonstrate the phenomenon, run:
python trainer_airbench.py
python viz.py
This will produce an output like the following.
Showing examples whose estimated self-influence is statistically-significantly different from zero (p < 0.01):
Example index margin self-influence p-value
with without
Random examples:
0 4.741 4.074 +0.667 0.0000
2 6.307 6.175 +0.132 0.0030
7 6.084 6.189 -0.105 0.0046
8 5.464 5.292 +0.172 0.0005
9 4.542 4.425 +0.117 0.0057
13 4.243 3.747 +0.496 0.0000
16 2.999 2.591 +0.408 0.0000
17 1.637 0.545 +1.092 0.0000
18 3.691 2.819 +0.872 0.0000
19 4.132 3.315 +0.817 0.0000
Average: +0.467
Easy examples:
45114 10.635 10.881 -0.246 0.0000
47798 8.829 9.291 -0.463 0.0000
43746 8.596 9.138 -0.542 0.0000
47082 11.145 11.484 -0.339 0.0000
44095 10.255 10.477 -0.222 0.0005
49524 12.359 12.645 -0.286 0.0001
41014 13.380 13.582 -0.202 0.0004
47731 12.156 12.376 -0.220 0.0005
49015 11.290 11.456 -0.167 0.0005
49690 10.756 10.962 -0.207 0.0003
46836 11.296 11.517 -0.221 0.0000
43189 7.493 7.676 -0.183 0.0020
49320 10.248 10.452 -0.204 0.0004
Average: -0.269
The script trainer_airbench.py
(or trainer_madry.py
, if you want to use a different training configuration) performs the following steps.
- Trains 1000 models on the CIFAR-10 training set (as usual).
- Trains 1000 models on the CIFAR-10 training set, missing 40 specific examples (same number of steps of training, but the model just never sees those 40 examples).
- Saves the logit outputs of all of the models, on all of the training examples, to disk. (I.e., two tensors of shape
(1000, 50000, 10)
)
All of the runs of training are with otherwise identical hyperparameters.
The 40 specific examples are hardcoded. I chose their indices so that:
- The first 20 indices are just
[0, ..., 19]
, which amounts to random examples (since CIFAR-10 is shuffled). - The last 20 indices are chosen to be easy examples which are known to have negative self-influence for the
airbench
trainer.
The point of the experiment is to determine the impact of removing these 40 examples, on their own margins.
The result is as follows:
- For the random 20 examples, removing them tends to reduce their confidence (this is quite intuitive).
- For the easy 20 examples, removing them tends to increase their confidence (this is the new phenomenon).
Or equivalently, we can say that adding easy examples to the training set decreases the trained model's confidence on them. (Which is what's shown in the script output.)
Presumably because doing 1,000 trainings to get statistical significance on the differences is a pain in the butt.
What happens if we try with the Madry trainer instead (which is quite different, within the space of CIFAR-10 trainings)?
Here's the output I got from doing so with 4000 total runs.
Showing examples whose estimated self-influence is statistically-significantly different from zero (p < 0.01):
Example index margin self-influence p-value
with without
Random examples:
0 6.310 5.088 +1.222 0.0000
2 9.396 9.088 +0.308 0.0000
8 8.463 8.235 +0.228 0.0000
11 7.629 7.475 +0.153 0.0016
13 5.706 4.521 +1.185 0.0000
16 4.918 4.543 +0.375 0.0000
17 1.805 -0.078 +1.882 0.0000
18 5.068 3.743 +1.325 0.0000
19 5.511 4.359 +1.152 0.0000
Average: +0.870
Easy examples:
47798 20.914 21.178 -0.263 0.0003
43746 21.084 21.401 -0.317 0.0000
47731 19.915 20.149 -0.234 0.0005
Average: -0.271
Looks like the same thing, so the phenomenon is not just specific to airbench
trainings.
And here's another run with the Madry trainer, with cutout augmentation turned off.
Computing the correct-class margins for each of the 40 examples which were ablated
Showing examples whose estimated self-influence is statistically-significantly different from zero (p < 0.01):
Example index margin self-influence p-value
with without
Random examples:
0 5.260 6.472 +1.212 0.0000
2 9.116 9.438 +0.322 0.0000
5 8.660 8.801 +0.141 0.0026
8 8.276 8.452 +0.176 0.0032
13 4.225 5.368 +1.143 0.0000
16 3.756 4.388 +0.632 0.0000
17 -0.177 2.149 +2.326 0.0000
18 3.965 5.260 +1.295 0.0000
19 4.256 5.499 +1.243 0.0000
Average: +0.943
Easy examples:
47798 21.559 21.239 -0.320 0.0002
47082 19.311 18.983 -0.328 0.0004
44095 18.063 17.718 -0.345 0.0000
41014 26.358 26.110 -0.248 0.0006
47731 20.699 20.465 -0.234 0.0037
Average: -0.295
Again the same thing.