rderelle/Broccoli

Run failing at step2

Opened this issue · 5 comments

Hello

I am trying to run Broccoli and it keeps returning the following error message:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/lustre/alice3/data/evassvis/software/broccoli/Broccoli-master/scripts/broccoli_step2.py", line 378, in process_file
a = subprocess.check_output(path_fasttree + ' -quiet -nosupport -fastest -bionj -pseudo ' + insert + ' -n ' + str(nb_alis) + ' ' + str(Path(out_dir / name_ali_file)) + ' 2>&1', shell=True)
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'fasttree -quiet -nosupport -fastest -bionj -pseudo -n 0 dir_step2/alis_27.phy 2>&1' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/evassvis/software/broccoli/Broccoli-master/broccoli.py", line 148, in
broccoli_step2.step2_phylomes(evalue, max_per_species, path_diamond, path_fasttree, trim_thres, phylo_method, nb_threads)
File "/lustre/alice3/data/evassvis/software/broccoli/Broccoli-master/scripts/broccoli_step2.py", line 78, in step2_phylomes
multithread_process_file(list_files, nb_threads)
File "/lustre/alice3/data/evassvis/software/broccoli/Broccoli-master/scripts/broccoli_step2.py", line 151, in multithread_process_file
results_2 = tmp_res.get()
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
subprocess.CalledProcessError: Command 'fasttree -quiet -nosupport -fastest -bionj -pseudo -n 0 dir_step2/alis_27.phy 2>&1' returned non-zero exit status 1.

I have run Broccoli on other datasets and it has worked perfectly so I would guess it is a problem with my input files but I cannot find any fault and am not sure what the error message means, other than that the problem involves fasttree?

Have you encountered this before?
Any help is much appreciated

Thanks

Hi Matt,

the problem comes from FastTree, which is unable for some reasons to analyse the file alis_27.phy.

Since you have already run Broccoli, I assume you are using the correct version of FastTree and that the problem comes from the proteome of one of your species. As a 'developer', it is difficult to anticipate all potential issues created by input data.

If you are ok to share with me the file alis_27.phy (directory 'dir_step2'), I could have a look at it.
My email is: romain.derelle at gmail.

best,
Romain

Hi Matthew,

thanks for sharing your data.

I have fixed step 2: now, Broccoli will not stop even if one pair of species does not yield any phylogenetic analysis.
Please download the current version of the pipeline and it should work.

If you have isolated all proteins of a given family and wish to classify it into orthologous groups, you might want to increase the Diamond e-value (-e_value parameter), potentially to 0, to make sure that all proteins get a hit.

best,
Romain

Hello, I think I am having a similar issue except seems it may be related to diamond:
I ran Broccoli (v1.2.1) with default parameters and I get the following error message:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/lustre/alice3/data/evassvis/software/broccoli_new/Broccoli-master/scripts/broccoli_step2.py", line 238, in process_file
--compress 1 --more-sensitive -e ' + str(evalue) + ' -o ' + str(index_dir / search_output) + ' --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1', shell=True)
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'diamond blastp --quiet --threads 1 --db dir_step2/databases/61.db --max-target-seqs 6 --query dir_step1/24.fas --compress 1 --more-sensitive -e 0.001 -o dir_step2/24/24_61.gz --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/evassvis/software/broccoli_new/Broccoli-master/broccoli.py", line 148, in
broccoli_step2.step2_phylomes(evalue, max_per_species, path_diamond, path_fasttree, trim_thres, phylo_method, nb_threads)
File "/lustre/alice3/data/evassvis/software/broccoli_new/Broccoli-master/scripts/broccoli_step2.py", line 78, in step2_phylomes
multithread_process_file(list_files, nb_threads)
File "/lustre/alice3/data/evassvis/software/broccoli_new/Broccoli-master/scripts/broccoli_step2.py", line 151, in multithread_process_file
results_2 = tmp_res.get()
File "/data/evassvis/software/anaconda3/envs/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
subprocess.CalledProcessError: Command 'diamond blastp --quiet --threads 1 --db dir_step2/databases/61.db --max-target-seqs 6 --query dir_step1/24.fas --compress 1 --more-sensitive -e 0.001 -o dir_step2/24/24_61.gz --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1' returned non-zero exit status 1.

I am also not sure if/what something is wrong with my input proteomes. Any suggestions? Happy to share any file if it can be of help to uncover the source of the problem.
Thank you
Alessandra

Hi,

could you please run the diamond command line and post here the output (error message):
diamond blastp --quiet --threads 1 --db dir_step2/databases/61.db --max-target-seqs 6 --query dir_step1/24.fas --compress 1 --more-sensitive -e 0.001 -o dir_step2/24/24_61.gz --outfmt 6 qseqid sseqid qstart qend sstart cigar
with that I could find the issue.
thanks

Romain

Hi Romain, thanks for your reply.
If I run the command you asked in the same directory of the failed run I get the message:

Error: Incomplete database file. Database building did not complete successfully.

Of note, I have in the meantime filtered out from my proteomes files any file that resulted empty and re-ran Broccoli and in that way it worked. Could that be a cause of the problem?

Thanks a lot
Alessandra