Diffusion scheduling code making abnormal protein output

Question

Diffusion scheduling code making abnormal protein output

Opened this issue 2 months ago · 1 comments

jmoojun commented 2 months ago

I believe your code has some discrepancies when compared to the pseudocode in your article.

Algorithm 1 TRAINING
Input: Training examples of structures, sequences, and
MSAs {(Si,Ai,Mi)}
for all (Si,Ai,Mi) do
Extract x1 ← BetaCarbons(Si)
Sample x0 ∼ HarmonicPrior(length(Ai))
Align x0 ← RMSDAlign(x0, x1)
Sample t ∼ Uniform[0, 1]
Interpolate xt ← t · x1 + (1 − t) · x0
Predict ˆ Si ← AlphaFold(Ai,Mi, xt, t)
Optimize loss L = FAPE2( ˆ Si, Si)

Does this pseudocode correspond to your code in wrapper.py ModelWrapper.distillation_training_step?

for t, s in zip(schedule[:-1], schedule[1:]):
output = self.teacher(batch, prev_outputs=prev_outputs)
pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None)
noisy = rmsdalign(pseudo_beta, noisy)
noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta

This holds the same in ModelWrapper.inference.

The atoms in the PDB output seems to be clustered together very densely, which makes it an abnormal protein structure.

Answer 1 · 2024-09-02T23:06:16.000Z

Which output are you showing here?
In the code, the time index is flipped --- so t=1 in the paper corresponds to t=0 in the code, and vice versa. Sorry that this is not documented more clearly.

for t, s in zip(schedule[:-1], schedule[1:]): output = self.teacher(batch, prev_outputs=prev_outputs) pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None) noisy = rmsdalign(pseudo_beta, noisy) noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta

for t, s in zip(schedule[:-1], schedule[1:]):
output = self.teacher(batch, prev_outputs=prev_outputs)
pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None)
noisy = rmsdalign(pseudo_beta, noisy)
noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta