Something surprising happened
Closed this issue · 1 comments
Hi,
I read your paper yesterday and found it impressive. I’m planning to replicate some of the experiments. However, when I used task arithmetic to evaluate the attack on the ViT-L-14 model and merge 6 models, I observed something surprising. Using the zeroshot.pt model directly as the adversarial model resulted in an attack success rate exceeding 20%. Do you have any ideas on why this happened? Thank you very much.
Hi,
Thanks for your interests! If I understand correctly, you use pre-trained CLIP-ViT-L-/14 to obtain an universal trigger (universal adversarial patch) and directly apply it to the merged model. This can happen because the universal trigger has certain transferability (In off-task scenario, we utilize shadow class construction and ADA to improve the generality). However, 20% of ASR is far from enough and you can further adopt our two-stage attack mechanism to promote the attack success rates. Let me know if you have any further questions!