Warning while loading framework and some other questions about output
jiansuozhe opened this issue · 17 comments
Hi there,
I downloaded the latest version of Counterfit last week and installed all the modules required but still have some problems. When I executed the command 'load art' I got a warning:
load art
The type of the provided estimator is not yet support for automated setting of logits difference loss. Therefore, this attack is defaulting to attacking the loss provided by the model in the provided estimator.
[+] art successfully loaded with defaults (no config file provided)
which did not exist before. When I executed the 'run' command I only got the adversarial input:
run
[-] Running attack HopSkipJump with id 12a70390 on creditfraud)
[-] Preparing attack...
[-] Running attack...
┌─────────┬──────────────┬──────────────────────────┐
│ Success │ Elapsed time │ Total Queries │
├─────────┼──────────────┼──────────────────────────┤
│ 1/1 │ 4.3 │ 24550 (5740.6 query/sec) │
└─────────┴──────────────┴──────────────────────────┘
┌┬┬┬┬────────────────────────────────────────────────────────────────────────┬┐
│││││ Adversarial Input ││
├┼┼┼┼────────────────────────────────────────────────────────────────────────┼┤
│││││ [4462.00 -2.30 1.76 -0.36 2.33 -0.82 -0.07 0.56 -0.40 -0.24 -1.53 2.03 ││
│││││ -6.56 0.17 -1.47 -0.70 -2.28 -4.78 -2.62 -1.34 -0.43 -0.30 -0.93 0.17 ││
│││││ -0.09 -0.15 -0.54 0.04 -0.15 239.93] ││
└┴┴┴┴────────────────────────────────────────────────────────────────────────┴┘
[+] Attack completed 12a70390 (HopSkipJump)
there are some differences in the scan summary as well (no 'queries'):
Additionally, could you please explain some meaning of values in the scan summary and the running output? Could you please tell me what does it mean by 'successes'? When can we say an attack is a 'success' or 'failure'? I guess that 'best score' means the success percentage of samples in an attack, is it correct?
In the running output, I guess the sample index means the number of samples in the attack right? What is the meaning of label and attack label? What is the meaning of % Eucl. dist. and Elapsed Time [sec]? I think 'queries' means the number of the attack on target, is it correct? I saw a list of decimal numbers in the adversarial input value, are they only random numbers or they have some similarities? Where are they from? Do they have some relations with my attack type?
Thank you very much for your help and patience.
Hi @jiansuozhe, thanks for your questions.
In v1.0 art is loaded dynamically and this warning comes from art, The type of the provided estimator is not yet support for automated setting of logits difference loss. Therefore, this attack is defaulting to attacking the loss provided by the model in the provided estimator.
.
scan
and run
are two separate commands with two mostly distinct purposes. scan
is for "fuzzing", where run
is for more manual testing. With scan
you will get more metrics that are useful for baselining, versus run
where you only really care about the output that you generate.
Defining what is a success or failure is going to depend on the attack and framework you use. For example, in art.py
, [check_success](https://github.com/Azure/counterfit/blob/5e385b0a6cf80e90bea76507ec1cdef2c85cea2b/counterfit/frameworks/art/art.py#L480)
uses the Adversarial Robustness Toolbox function to define "success" for both evasion attacks and extraction attacks. In augly.py
, we just make our own function. It's really up to you how you want to define success, the CFAttack object you pass to reporting.py
will have everything you need.
best_score
is the attack that generates highest confidence when switching a label, and does so with the least number of queries.sample_index
is the index of the sample from yourtarget.X
. Duringtarget.load()
, samples are loaded as a list intoself.X
. CF will reference this list to get samples for an attack.label
is the initial label for a sample, before any modifications have been made.attack_label
is the final label after an attack has completed.Eucl dist.
is the % change in the image from the original input to the final output.Elapsed time
is how long the attack took in seconds.Queries
is the number of queries it took to complete the attack. (lower the better)Adversarial Input
is the final output from an attack. Here it is a bunch of numbers because it's the creditfraud model. If you did the satellite demo, it would be an image. They are effectively a modified input sample. All samples get loaded when you interact with a target (creditfraud.py
).
Hopefully this is helpful!
Thank you very much for your help @moohax
Please don't hesitate to ask more questions!
Hello @moohax,
Could you please explain to me why I only got the adversarial input in my running output? Is it because of the estimator?
run
[-] Running attack HopSkipJump with id 12a70390 on creditfraud)
[-] Preparing attack...
[-] Running attack...
┌─────────┬──────────────┬──────────────────────────┐
│ Success │ Elapsed time │ Total Queries │
├─────────┼──────────────┼──────────────────────────┤
│ 1/1 │ 4.3 │ 24550 (5740.6 query/sec) │
└─────────┴──────────────┴──────────────────────────┘
Additionally, could you please tell me how can we extract essential information from the adversarial input? For instance, find the useful numbers in a bunch of numbers to evaluate the model? @moohax
@jiansuozhe That's what run
provides. Compare the input and output with predict -a
and predict -i <sample_index>
. You can trace into reporting
to customize these reports.
In terms of "evaluating a model". Counterfit is largely designed as a red team tool, and the traditional sort of "robustness" testing is not necessarily a feature that we put front and center. For this type of reporting, I would dig into what art
proper has and add those functions or elements to art.py
in post_attack_processing
.
Hello @moohax,
Thank you for your reply. You mean that Counterfit is developed to test the protection level of system and find the leaks right?Could you please tell me how to realize this function?For example, I run an attack on a target and get the running output, I should be able to get some useful information from the output, is it correct? Can I get some useful information, for instance where are the leaks in my AI system or how to improve my AI algorithm to make it safer, from my output? Or I can only get the feedback like "my system is not safe when facing evasion attack" or something like this. Thank you.
The useful information is the output. If there is some metric or some output you would like to see, please let us know.
You could collect all of the outputs (Adversarial Inputs) and use them in an adversarial retraining scheme. But Counterfit has no official retraining mechanism built in. It could be done by adding training code to your target, calling train()
in the targets load()
function, then reloading the target. Again, that is unofficial and your milage may vary.
To explore targets and the completed attacks, from the counterfit>
terminal, drop into an IPython shell and import CFState.
ipy
>> from counterfit.core.state import CFState
>> CFState.state().targets
>> CFState.state().active_target.attacks
Hello @moohax,
I found that when running HopSkipJump I can get the output, but when running the other attacks I only got the bugs. For instance, when running BoundaryAttack I got "Result too large", when running BasicIterativeMethod I got "no attribute 'predict wrapper'". Additionally, the most frequent problem is "object of type 'NoneType' has no len()". Is it because the input data(.npz) and the model file(.pkl) you provided can only be used in HopSkipJump? Or I need to switch my target? Thank you.
@moohax, additionally, I have never created a new input data file before. Could you please give me some tips on how to create an input file (.npz)? Thank you.
Each attack has varying requirements. The Boundary Attack likely ran successfully, but the output may have been too big for some auxiliary process. You can trace through counterfit.frameworks.art.post_attack_processing()
or check_success()
in that same module.
A helpful debugging thing is to put from IPython import embed; embed()
into some function you want to examine during runtime. It will drop you into an IPython terminal that allows you to explore the current state. A more advanced debugging alternative is from IPython.core.debugger import set_trace; set_trace()
, this will give you pdb style debugging.
You may need to switch the target, an attack can be either open-box (you have the model file), or closed-box (you have access to inference only). Hop Skip Jump is a closed-box attack and the Basic Iterative Method is an open-box attack. The implication is that the backend framework Adversarial Robustness Toolbox requires an estimator/classifier that inherits from CLASSIFIER_LOSS_GRADIENTS_TYPE
.
Counterfit passes everything back to the framework to be built and run. The targets provided are for demo purposes, and we artificially force a particular ART loading process a target_classifier
attribute attached to the target for testing purposes.
For example, digits_keras vs digits_blackbox
If you provide no target_classifier
it will assume you are using a closed-box attack. As all of targets have model files, you can use open-box (whitebox) attacks against them. However, depending on the attack, you may need to provide additional items. We tend to focus on closed-box (blackbox) attacks.
Additionally, the most frequent problem is "object of type 'NoneType' has no len()"
I run into this, It's a bug when the attack fails to run. Because the attack fails to run, results
never get set, and then successes
can't be properly calculated and reported.
additionally, I have never created a new input data file before. Could you please give me some tips on how to create an input file (.npz)? Thank you.
This is a numpy zip file and is not explicitly a requirement for targets.
self.X
should be a list of lists, where each entry is some sample you want an attack to perturb. Whether you keep your data in a text file, a database, or a single image in the target folder Counterfit only cares that self.X
is a list of lists. Counterfit uses get_samples
when preparing an attack. This function just pulls a sample, whether or not it will work against the target is handled in your predict
function.
Similarly, in predict
, x
is also a list of lists. So you can handle multiple samples, or a batch of samples. For example, if you have a simple REST endpoint that does not take a batch on inputs, your predict would look like...
def predict(self, x):
for sample in x:
send sample to endpoint
if you are working with a local model, or an api that can accept a batch, the predict would look like...
def predict(self, x):
send x to endpoint
...
You will also return a list of lists from predict
, where each list is the output from the target model for the particular sample.
Hello @moohax,
I loaded the framework art and interacted the target satellite, but I found that almost all the attacks did not work. The most frequent two questions are the following:
[-] Preparing attack...
[-] Running attack...
[!] Failed to run ee34d02c (ZooAttack): 'BlackBoxClassifier' object has no attribute 'channels_first'
Even if I interacted the target digits_blackbox the problem still occur. Could you please tell me where is the BlackBoxClassifier object and how can I fix it? Or I should not run the attacks like this? Thank you.
The second problem is like following:
Could you please tell me how to fix it? Thank you.
Hello @moohax,
I found that if I do not use the latest version of counterfit, my attacks really worked. When I interacted the target tutorial, all the attacks worked. When I interacted the target satelliteimages, only the pixel attack and the threshold attack did not run properly. When running pixel attack, the problem was similar as before (no attribute XXX), when running threshold attack, the problem is like follows:
I do not know if I can fix it by changing the settings of my system.
Now that I would like to utilize counterfit to develop a system testing the security of AI models those are used to classify images, could you please tell me if I just need to consult the target tutorial and satelliteimages to design my own target? Thank you.
Nice work! (I know it doesn't seem like it).
Failed to draw an adversarial image...
is the Adversarial Robustness Toolbox saying the attack failed. My recommendation is to use Hop Skip Jump as it is Boundary Attack 2.0. In a not too distant update (internal at the moment) you will be able to optimize attack parameters.
The memory allocation looks like a bug, seems as though the target is trying to process ALL images in the dataset as a sample. Double check to see if self.X
is a list of lists after you load. Its often helpful to set a breakpoint ater self.X
gets loaded and explore the data to make sure it's as you expect. Same advice for your predict function.
Hello @moohax,
I downloaded the latest version of Counterfit and found that I could not load from config.json. Could you please tell me how to deal with this problem? Thank you.
This is just a warning. You can provide a config that would limit the available attacks, or provide defaults. Otherwise Counterfit will just dynamically load all attacks.
Each respective framework implementation can be found under the folder named after the framework, art.py
for example. This will give you insight into how it all gets loaded.