Where is the Knowledge Alignment Network for the Energy Conservation?

Thank you for your interest in our work. The law of conservation of energy is not directly applicable to SEVIR. The code for experiments on N-body MNIST will be released in the near future.

Thank you. I wonder whether you can just release the code in terms of the law of conservation of energy training. I would like to refer to your code to make some adjustments to my task.

I have noticed in Table 9 in the Appendix. It seems that the design of the knowledge control networks is almost the same for the $N$-body MNIST and SEVIR datasets. So now I am more curious about how you deal with the energy condition part. Thank you very much.

Also, I have noticed that in you set the cond_ids to [0, num_timesteps - 1, num_timesteps - 1, ..., num_timesteps - 1] in the following function:

PreDiff/src/prediff/diffusion/knowledge_alignment/alignment_pl.py

Line 169 in 1a2a6ef

def make_cond_schedule(self, ):

So this means in the you will probably always set the tc to num_timesteps - 1 in most cases:

PreDiff/src/prediff/diffusion/knowledge_alignment/alignment_pl.py

Line 364 in 1a2a6ef

tc = self.cond_ids[t]

Am i understand correctly?

Also, I have noticed that in you set the cond_ids to [0, num_timesteps - 1, num_timesteps - 1, ..., num_timesteps - 1] in the following function:

PreDiff/src/prediff/diffusion/knowledge_alignment/alignment_pl.py

Line 169 in 1a2a6ef

def make_cond_schedule(self, ):

So this means in the you will probably always set the tc to num_timesteps - 1 in most cases:

PreDiff/src/prediff/diffusion/knowledge_alignment/alignment_pl.py

Line 364 in 1a2a6ef

tc = self.cond_ids[t]

Am i understand correctly?

Oh, I saw the difference.

For the zc = self.q_sample(x_start=zc, t=tc, noise=torch.randn_like(c.float())) , the timestamp will be almost the num_timesteps - 1 for everywhere. But for the zt = self.q_sample(x_start=z, t=t, noise=torch.randn_like(z)), the timestamp will be sampled randomly.

One more question, I noticed that in the below

PreDiff/src/prediff/diffusion/knowledge_alignment/alignment_pl.py

Line 375 in 1a2a6ef

pred = self.torch_nn_module(zt, t, y=y, zc=zc, **aux_input_dict)

The zc and the y are as input to the self.torch_nn_module, which i think should be the input to the below forward function of NoisyCuboidTransformerEncoder class

PreDiff/src/prediff/diffusion/knowledge_alignment/models.py

Line 459 in 1a2a6ef

def forward(self, x, t, verbose=False, **kwargs):

However, I could not find the exact usage of t and zc in the implementation of forward function of NoisyCuboidTransformerEncoder class.

Please point me out if I miss somewhere. Thank you.

Just want a clarification: in equation (4) in Prediff paper, the input to the constraint function is $\mathcal{F}(\hat{x}, y)$, where $\hat{x}$ should be the output of the decoder. However, I noticed that the authors of Prediff actually leverage $U_\phi$($z_t$, $t$, $z_{\mathrm{cond}}$) to estimate $\mathcal{F}(\hat{x}, y)$, where $z_t$ comes from the output of the VAE encoder with the added noise.

I just wonder why Prediff designed to use $U_\phi$($z_t$, $t$, $z_{\mathrm{cond}}$) to estimate $\mathcal{F}(\hat{x}, y)$ rather than the $\mathcal{F}(\hat{z}_t, y)$, which is generated from diffusion reverse process? Because once we can approximate the generated $\hat{z}_t$ from the diffusion model, to the $z_t$ from the output of the VAE encoder with the added noise, then we can guarantee the input to the VAE decoder is almost the same, and finally, we can get the $\hat{x}$ which are similar to the VAE reconstruction process (I guess).

Thank you very much.