Huage001/AdaAttN

Mean-variance-norm and Instance Norm

sonnguyen129 opened this issue · 10 comments

Hi @Huage001
I read the paper and found that mean variance norm mean 'mean-variance channel-wise norm' works quite like instance norm. Can you explain to me why use mean-variance-norm function instead of instance norm?
Thank you so much.

Hello,
Actually, the difference between them is little. One difference is that mean-variance-norm uses unbiased variance estimation while instance norm uses the biased one. You can also try with instance norm but I think there may not be substantial effect on final outputs.

I have 1 more question. When testing, is the output size 256 x 256, can the models produce other sizes?

In our experiments, the default output size is 512x512. Other sizes are also OK. But it would be better to set the size as a multiple of 16, to avoid problems on down sample and up sample operations.

Hi @Huage001
Thank you for your reply. When I read MUNIT paper, those authors said that IN will remove importance style information.
image

But in AdaAttN paper. Authors use Norm in AdaAttN module with style features.
image

This makes me quite confused. Please explain to me.

Since IN removes the style information, we can compute content-wise similarity between content and style images after IN. This similarity is used to aggregate style feature F_s, as shown in the third row of the above figure. The aggregated style feature is not proceeded with IN.

Hi @Huage001
Thank you for your reply. What does adaptive in adative attention normalization mean? Can models that can represent formulas like AdaAttN be called adaptive? Is SANet adaptive, I didn't see the author mention it all

The name of AdaAttN actually follows AdaIN. "Adaptive" is used to describe the normalization operation, whose parameters are dynamically (adaptively) dependent on the style feature. From this perspective, we can also call SANet, even all the current attention methods "adaptive".

Hi @Huage001
Do you think swapping the content and style features for the SANet module or the AdaAttN module makes any difference?
image
And
image
Thank you so much

In that case, you can imagine the content image would serve as "style reference" while the style image would serve as "content reference". Typically in attention-based style transfer, query (Q) should be contents, key (K) and value (V) should be styles.

Hi @Huage001
Thank you for your explaination. I will close this issue. I will re-open if I have another question in the future
Wish you all health, success and happiness!
Best regards,
Son Nguyen.