rderelle/Broccoli

Python errors

Closed this issue · 7 comments

hi - thanks for developing Broccoli, sounds like a great tool! I am trying to run it (for the first time) on a set of 25 metazoan genomes and I am encountering some errors. Basically, it looks like it is stuck at the stage 2. I tried to run manually the commands that failed (diamond blastp --quiet --threads 1 - ...), I got this error: No such file or directory Error: Error opening file ./dir_step2/0.db

The error log:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/f/fmarletaz/work/miniconda2/envs/broccoli/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/f/fmarletaz/work/miniconda2/envs/broccoli/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/work/RokhsarU/ferdi/Broccoli/scripts/broccoli_step2.py", line 216, in process_file
    subprocess.check_output(path_diamond + ' blastp --quiet --threads 1 --db ./dir_step2/' + file_db.replace('.fas','.db') + ' --max-target-seqs ' + str(max_per_species) + ' --query ./dir_step1/' + file + ' \
  File "/home/f/fmarletaz/work/miniconda2/envs/broccoli/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/f/fmarletaz/work/miniconda2/envs/broccoli/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'diamond blastp --quiet --threads 1 --db ./dir_step2/0.db --max-target-seqs 6 --query ./dir_step1/10.fas                  --compress 1 --more-sensitive -e 0.001 -o ./dir_step2/10/10_0.gz --outfmt 6 qseqid sseqid qstart qend qseq_gapped sseq_gapped 2>&1' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "broccoli.py", line 157, in <module>
    broccoli_step2.step2_phylomes(evalue, max_per_species, path_diamond, path_fasttree, trim_thres, nb_threads)
  File "/work/RokhsarU/ferdi/Broccoli/scripts/broccoli_step2.py", line 67, in step2_phylomes
    multithread_process_file(list_files, nb_threads)
  File "/work/RokhsarU/ferdi/Broccoli/scripts/broccoli_step2.py", line 141, in multithread_process_file
    results_2 = tmp_res.get()
  File "/home/f/fmarletaz/work/miniconda2/envs/broccoli/lib/python3.8/multiprocessing/pool.py", line 768, in get
    raise self._value
subprocess.CalledProcessError: Command 'diamond blastp --quiet --threads 1 --db ./dir_step2/0.db --max-target-seqs 6 --query ./dir_step1/10.fas                  --compress 1 --more-sensitive -e 0.001 -o ./dir_step2/10/10_0.gz --outfmt 6 qseqid sseqid qstart qend qseq_gapped sseq_gapped 2>&1' returned non-zero exit status 1.
slurmstepd: error: *** JOB 7967016 ON sango40406 CANCELLED AT 2020-02-08T17:07:52 ***

Hi,

Are you sure you are using the correct version of DIAMOND ?
It should be version 0.9.25 or above.

Romain

Hi Ferdinand,

my apologies, I have actually misread your first message: Diamond is not running correctly because it cannot find the file ./dir_step2/0.db, which corresponds to the Diamond database of the species '0' of your dataset. I'm afraid I have never seen this kind of issue.

Could you please try the following command (modify the path of your diamond executable):

diamond makedb --in ./dir_step1/0.fas --db ./dir_step2/0.db

You should get an error message since Diamond failed to create the file ./dir_step2/0.db.
Could you please post it here?

thanks
Romain

Salut,
Ok, so I investigated a bit further and I realised it was my fault: a few of my proteomes had problematic characters ('.' and '*' which are sometimes added as stop in some pipelines) and diamond really doesn't like them!

Hi Ferdi,

many thanks for pointing this problem.
I'll do some testing and add a filter to Broccoli.

Romain

yes, you are right regarding the absence of matrix/table in the outputs.
I'll upload soon a version of Broccoli that creates such matrix at the end of step 3.

thanks
Romain