Foldmason not running on large set of proteins
Opened this issue · 11 comments
Expected Behavior
I am running foldmason with the command below:
easy-msa /workspace/protein/structs /workspace/results_foldmason/protein/result /workspace/results_foldmason/protein/tmpFolder --report-mode 1 --precluster --max-seq-len 4000
I have about 2000 proteins of approx length 280 amino acids.
Current Behavior
I am getting memory errors.
Steps to Reproduce (for bugs)
Just run easy-msa on a large set of sequences.
Foldseek Output (for bugs)
I get the output below (last few lines):
Size of the sequence database: 3588
Size of the alignment database: 3588
Number of clusters: 1487
Writing results 0h 0m 0s 0ms
Time for merging to clu: 0h 0m 0s 428ms
Time for processing: 0h 0m 36s 725ms
Error: structuremsa died
Segmentation fault (core dumped)
Context
The --max-seq-len parameter doesn't seem to make a difference. I'm still getting the memory error.
Your Environment
I've been running foldmason via the docker image created from the dockerfile. I am running on a kubernetes cluster and provide 64Gb of RAM, and 6 cpus.
Do you also get the same behaviour with pre-clustering disabled (--precluster 0
)?
How did you build the container? I just realized that we have not been automatically building containers.
@gamcil commited a fix earlier today. Could you check if the issue is still happening for you? You can download precompiled binaries at https://mmseqs.com/foldmason.
How did you build the container? I just realized that we have not been automatically building containers.
I used the Dockerfile provided in the repository
@gamcil commited a fix earlier today. Could you check if the issue is still happening for you? You can download precompiled binaries at https://mmseqs.com/foldmason.
Running this now. It seems to have gone further than before. I removed --report-mode 1
. Could that be causing the issue?
@gamcil commited a fix earlier today. Could you check if the issue is still happening for you? You can download precompiled binaries at https://mmseqs.com/foldmason.
Running this now. It seems to have gone further than before. I removed
--report-mode 1
. Could that be causing the issue?
Confirmed that running with the precompiled binaries and using the command:
./foldmason easy-msa /workspace/protein/structs /workspace/results_foldmason/protein/result /workspace/results_foldmason/protein/tmpFolder --precluster
ran without any errors. So the issue might be including the --report-mode 1
parameter.
Were you getting segfaults also with --report-mode 1
?
Were you getting segfaults also with
--report-mode 1
?
It seems like the problem is --report-mode 1
. If I remove that and run the command, I don't get segfaults. If I include it, I get segfaults.
Would it be possible to share the inputs so that we can try to reproduce the new issue?
Would it be possible to share the inputs so that we can try to reproduce the new issue?
I tried it again with --report-mode 1 and it seems to be working. Thank you for the assistance. I'll reach out in case I face any problems.