kliment-olechnovic/ftdmp

Error when scoring many structures

Opened this issue · 5 comments

Following the updated FTDMP installations instructions, I was able to install FTDMP without any issues. All my initial tests finished successfully for both CPU and GPU installs. However, the end goal was to score thousands of structures and I do seem to run into a critical voronota-js error when using FTDMP on large numbers of structures: Killed | voronota-js --no-setup-defaults. I am attaching the error in full here: error-message.txt. Specifically, I am using it to rank 4000 structures and am using a GPU for these calculations as the CPU install would require over 7 days (my walltime).

When the FTDMP job is killed, it has created the following files. The only file that I don't see that I was expecting was results_for_humans.txt:
ids_to_process.txt, raw_scoring_results_FGV.txt, raw_scoring_results_FIGNN, raw_scoring_results_FIV.txt, raw_scoring_results_FIVb.txt, raw_scoring_results_without_ranks.txt, raw_scoring_results_with_ranks.txt, and raw_top_scoring_results_raw.txt.

My FTDMP job specifications were:

ls ./*.pdb \
| /home/kastner/workspace/src/ftdmp/ftdmp-qa-all \
  --rank-names protein_protein_voromqa_and_global_and_gnn_no_sr \
  --ftdmp-root /home/kastner/workspace/src/ftdmp \
  --conda-path /home/kastner/workspace/src/miniconda3 \
  --workdir "./works"

My current install specifications are:

OS: Red Hat Enterprise Linux Server release 7.9 (Maipo)
Python: 3.8.18
CUDA: 12.1
PyTorch:  2.1.0
Pandas: 2.0.3
R: 4.3.1
OpenMM: 8.0
Torch Geometric: 2.3.1
GPU: Tesla V100-SXM2-32GB

The input PDBs were AlphaFold Multimer predictions for T1109 from CASP15.

Hello,

I think the problem is that in 'ftdmp-qa-all' the default '--ranks-top' option value is now very large, thus when you run it for 4000 models it computes 4000x4000 comparison matrix, which is totally unnecessary for the best model selection and overuses the provided resources. Please, try setting it to lower value, e.g. 200, which is usually enough to select the best models. Changing --ranks-top value should not affect already computed raw scores - you can re-run the command on the same directory with different --ranks-top values.

The fact that the default '--ranks-top' is currently so big and I do not comment on it in the docs is my mistake, of course. I apologize for that. I will try to update it later this week.

Update: I have updated the ftdmp-qa-all script, the default value for '--ranks-top' is 300 now.

Thank you! That worked perfectly. All my tests finished in a timely manner.

In my tests, when I change --ranks_top to 200 to score 4000 structures, the resulting results_for_humans.txt file will have scores for anywhere between 47 to 870 of the 4000 structures and doesn't have scores for all of them. Is this the expected behavior and how does it decide which complexes to considering in the scoring? To get scores for all 4000, would I have to set --ranks_top to 4000, which would then cause it to suffer form the same issue as before? In that case, could more GPUs or memory help avoid the previous error?

This is expected. The goal of the method is to select and rank top models. It works by first scoring all the models using various "raw" scores, and then computes a consensus-like score for the unions of top models from each "raw" ranking, which involves the similarity matrix calculation. The algorithm, called "VoroIF-jury" is described on the page 2 in "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fprot.26569&file=prot26569-sup-0001-Supinfo.pdf".

In theory, you can set --ranks-top to 4000, but then you may need to increase the number of processors for the parallel computation of the similarity matrix - this is done using the '--processors' option, which also affects the parallelization of all the other sub-tasks. From practice, if your goal is to select top models, there is no need to set '--ranks-top' high.

What does "-ranks-top 200" mean? It means that from every "raw" ranking top 200 models are taken, then the union of all those top models from every "raw" ranking is analyzed. So you are not missing any models that were ranked in top 200 by at least one "raw" score.

Then how can you get 87 models? Probably because of the redundancy filtering - this is controlled by '--output-redundancy-threshold' option. By default it is set to 0.9, which means that it will filter out every model that have 0.9 interface CAD-score similarity with any model ranked higher.

Thank you for the clear explanation! It was very helpful.