Is "AF2 initial guess" supposed to have access to the standard AF2 databases?

Question

Is "AF2 initial guess" supposed to have access to the standard AF2 databases?

rszabla opened this issue a year ago · 3 comments

Hello,

My installation seems to be running as expected and it produced some beautifully-folded peptide binders with low pae_interaction scores for my project. My question is how is the "AF2 initial guess" pipeline able to generate a predicted structure without access to all the same databases that are normally required to run AF2? i.e.:

    bfd/                                   # ~ 1.8 TB
    mgnify/                                # ~ 64 GB
    params/                                # ~ 3.5 GB
    pdb70/                                 # ~ 56 GB
    pdb_mmcif/                             # ~ 206 GB
    uniclust30/                            # ~ 87 GB
    uniref90/                              # ~ 59 GB

Did I miss an important part of the installation? Is the predict.py script somehow running the full AF2 installation on my machine? Or am I just ignorant to how your implementation of AF2 works?
I just want to make sure that my dl_binder_design installation is configured properly before I put too much trust into the pae_interaction scores and spend money on peptide orders.

Thank you,

Robert Szabla

Answer 1 · 2023-11-29T03:46:45.000Z

I have the same question.

Answer 2 · 2023-11-29T19:00:38.000Z

The AF2 initial guess protocol only requires the AF2 weights, which you have already downloaded as part of the installation described in the repo. The other databases included with AF2 are used for MSA and template generation, both of which are critical for predicting native proteins. Since we are predicting idealized de novo proteins we don't use MSAs or templates.

So to answer your question: if you have designs with low pae_interaction then they are predicted to work by AF2 initial guess!

Answer 3 · 2023-11-29T20:54:23.000Z

Thank you, that is very helpful!