jasonkyuyim/multiflow

Question about the Discrete Variables Calculation

Closed this issue · 2 comments

Hi thanks for contributing to this wonderful work in protein co-design, particularly in the novel CTMC mechanism. The code is clean and easy to understand. However, I am little confused about the implementation of the discrete flow matching part.

Specifically, in the FlowModel module, you directly forward the node embeddings into a linear layer for the amino acid type logit prediction. Then a cross entropy loss is followed to compute the amino acid type loss.
image
image

However, in the paper present, you proposed to maximize the conditional log-likelihood of the residue types instead of the simple CE loss. Can you please give an explanation for this nuance? Thanks.
image

I'm not sure I follow the question. Minimizing the cross entropy loss is the same as maximizing the log-likelihood right?

Yes. I misunderstood the formula at that time.