Zejun-Yang/AniPortrait

Confusion over pose_guider.py, seems that there's no cross attention between ref pose and target pose.

fredkingdom opened this issue · 1 comments

Thanks for the awesome work.

Please correct me if I am wrong but I've notice in your paper, it is claimed that the PoseGuider’s cross-attention module facilitates interaction between the reference landmarks and each frame’s target landmarks. However, in pose_guider.py line 86 to 89, where you define the cross attention module, the parameter cross_attention_dim is not given and thus is None during the init of Transformer2DModel (line 181) and BasicTransformerBlock (line 228). Resulting no cross attention between ref pose and target pose.

I'm wondering if this is intended or it's a bug, and what's the effect on the performance?

😂 Thank you so much. At first we concatenated the ref pose after each target poses and did self attention on them, and the target images showed the phenomenon of warping from ref image. This result means the model can learn the spatial relationship between ref pose and target pose. After that we try to use cross attention to replace self attention, which has the similar effect. Seems that the published code is not the complete version. You can finish the definition of cross_attention_dim and fine-tune the pose guider model.
But it's not a big deal, because we have devised a new strategy to merge ref and target pose information in our upcoming portrait animation project, so stay tuned!