microsoft/protein-frame-flow

Questtion about novelty calculation

Wangchentong opened this issue · 3 comments

Thanks for you authors share this great work, i wonder how the pdb dataset is curated for the calculation of novelty? do you split each chain in the pdb database or just based on the training single chain or some thing else?

i guess you use default foldseek PDBdatabse?

Hi, we take the whole PDB dataset as instructed in foldseek's documentation. I believe this is all the single sequences. We use the following flags to run novelty calculations.

-alignment-type 1 --format-output query,target,alntmscore,lddt --tmscore-threshold 0.0 --exhaustive-search --max-seqs 10000000000

Thanks jason!