[Question] about perturbation- and gradient-based methods
UdonDa opened this issue · 2 comments
Hi, @daemon !
Thanks for sharing exciting work!
I read your paper. And, I have a question about how to do experiments using perturbation- and gradient-based methods in Section 2.2 Diffusion Attentive Attribution Maps.
You mentioned that gradient methods require to need a back-prop for all T timesteps.
I agree with the problem. However, following the sentence, you wrote that even minor pertubations cause to generate different images.
How did you confirm the problem? I'd like to try both perturbation- and gradient-based methods.
Would you tell me how I can verify them?
Best regards.
Sure, so for the perturbation-based analysis, I added a small amount of Gaussian noise to each word embedding, then compared differences in the output image. The key difficulty was that the resulting image would often change excessively, at least compared to DAAM. Further, this approach was computationally burdensome, as I needed to perturb every word multiple times for each word in the sentence. For a prompt of, say, 12 words, this would've roughly amounted to ~50 inference passes.
I tried a few things for the gradient-based approach. First, I backpropagated each pixel, which was evidently infeasible. Second, I backpropagated patches of pixels (e.g., 16x16), which, while more feasible, yielded terribly noisy results. Note that for any of this to be possible, I had to block the gradient from flowing backward beyond one time step.
Please reopen if needed.