eead-csic-compbio/get_homologues

Error: find_COGs

Pirxtrurl opened this issue · 6 comments

Hello,

My name is Jorge Val. Currently, I am working as a postdoc at the University of Edinburgh.

image


The only difference in this run was the size of the set. It was 133 large microbial genomes (from 5 Mbps to 11 Mbps).

I try to solve it by myself, but I couldn't.

I check the get_homologues installation and it seems correct:

image

Please, could you help me?

Best regards,

Jorge

Hi @Pirxtrurl , your error message seems to come from this line:

else{ die "# ERROR: find_COGs ($COGTRIANGLESEXE) failed to terminate job\n" }

The comments in that part of the code indicate that sometimes disk latency might cause trouble as COGs writes large files, often simply re-running sorts things out. However, this might also be a RAM bottleneck problem, for which you might try option -s or a larger computer.

The failed job should leave behind at least three files (cog-edges.txt, all-edges.txt and all.cog.clusters.log), these might help you track down the problem further,
let me know how this goes,
Bruno

Thanks for your quick answer.

I also suspected it was a problem due to the volume of data. I have tried restarting the analysis several times, but it keeps giving me the same error. I guess the algorithm must reach some bottleneck, as you suggest. Unfortunately, I don't have another computer with more capacity.

I have drastically reduced the number of genomes I use as input. I hope this will avoid crashing the analysis. I will also try to repeat the same run by adding the -s option. Perhaps with this option activated, the program can complete the analysis.

Anyways, thanks for your help.

Best regards.

Jorge

If you don't mind sharing your input .gbk files I can try and run it here and see how taht goes, it is still possible that some code needs fixing,
Bruno

Hello again,

So far, I have tried to reduce the size of my input by about half, and it works correctly. The problem is related to the volume of data to process.

What I am doing now is gradually increasing the volume of genomes in my input, taking advantage of the fact that your program allows adding new genomes to the calculations already done.

Thanks for your offer. If you don't mind, I will send you a private mail.

Best regards,

Jorge

Available RAM for this job was 32GB