gaozhihan/PreDiff

Question about adding additional data to the model

marctimjen opened this issue · 2 comments

Hello Zhihan Gao

I have an implementation / extension of the model - question that I hope you can help me with :).

I was thinking about adding additional variables to the context of the model.
So like adding weather forecast maps, elevation maps and such to the model.

My initial thought is to add this additional information with a network to the state in front of the cuboid attention blocks. Just like you do with the time embedding in the prediff model.

What is your thought on this? Do you perhaps have a better idea on how to do this?

Thank you very much in advance.
Best regards,
Marc Jensen

Thank you for your question. Adding additional inputs (conditionings) is certainly possible. The specific approach would depend on the model architecture and the modalities.
For example, in Stable Diffusion/LDM the textual prompts are encoded via CLIP's text encoder and then processed by the cross-attention layers, as illustrated in Figure 3 of the original paper.

Thank you very much :)