bjing2016/alphaflow

Diffusion scheduling code making abnormal protein output

Opened this issue · 1 comments

I believe your code has some discrepancies when compared to the pseudocode in your article.


Algorithm 1 TRAINING
Input: Training examples of structures, sequences, and
MSAs {(Si,Ai,Mi)}
for all (Si,Ai,Mi) do
Extract x1 ← BetaCarbons(Si)
Sample x0 ∼ HarmonicPrior(length(Ai))
Align x0 ← RMSDAlign(x0, x1)
Sample t ∼ Uniform[0, 1]
Interpolate xt ← t · x1 + (1 − t) · x0
Predict ˆ Si ← AlphaFold(Ai,Mi, xt, t)
Optimize loss L = FAPE2( ˆ Si, Si)

Does this pseudocode correspond to your code in wrapper.py ModelWrapper.distillation_training_step?


for t, s in zip(schedule[:-1], schedule[1:]):
output = self.teacher(batch, prev_outputs=prev_outputs)
pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None)
noisy = rmsdalign(pseudo_beta, noisy)
noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta

This holds the same in ModelWrapper.inference.

The atoms in the PDB output seems to be clustered together very densely, which makes it an abnormal protein structure.

image

Which output are you showing here?
In the code, the time index is flipped --- so t=1 in the paper corresponds to t=0 in the code, and vice versa. Sorry that this is not documented more clearly.