OLC-Bioinformatics/ConFindr

input fasta

Closed this issue · 8 comments

When running ConFindr with --fasta, I get the following error? Running fastq files is ok. Is Confindr designed to work on assembled files? It has the fasta option, seems like it should but it doesn't work.

--fasta If activated, will look for FASTA files instead of
FASTQ for unpaired reads.

Traceback (most recent call last):
File "/nfs/software/apps/ConFindr/0.7.2/lib/python3.7/site-packages/confindr_src/confindr.py", line 1045, in confindr
min_matching_hashes=min_matching_hashes)
File "/nfs/software/apps/ConFindr/0.7.2/lib/python3.7/site-packages/confindr_src/confindr.py", line 767, in find_contamination
out, err = run_cmd(cmd)
File "/nfs/software/apps/ConFindr/0.7.2/lib/python3.7/site-packages/confindr_src/confindr.py", line 33, in run_cmd
raise subprocess.CalledProcessError(p.returncode, cmd=cmd)
subprocess.CalledProcessError: Command 'bbmap.sh ref=output_assembly/SRR2038680/rmlst.fasta in=output_assembly/SRR2038680/trimmed.fastq.gz out=output_assembly/SRR2038680/out_2.bam threads=20 mdtag nodisk' returned non-zero exit status 1.

I've recently updated ConFindr to version 0.7.3 on Bioconda. It contains a few fixes for handling FASTA files. Are you willing to try the new version to see if it addresses your issue?

I tried the new version but I got a very similar error.
Traceback (most recent call last):
File "/nfs/software/apps/ConFindr/0.7.3/confindr_src/confindr.py", line 1051, in confindr
find_contamination(pair=fastq,
File "/nfs/software/apps/ConFindr/0.7.3/confindr_src/confindr.py", line 673, in find_contamination
out, err, cmd = bbtools.bbduk_trim(forward_in=os.path.join(sample_tmp_dir, 'rmlst.fastq.gz'),
File "/nfs/software/apps/ConFindr/0.7.3/confindr_src/wrappers/bbtools.py", line 108, in bbduk_trim
out, err = run_subprocess(cmd)
File "/nfs/software/apps/ConFindr/0.7.3/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess
raise subprocess.CalledProcessError(x.returncode, cmd=command)
subprocess.CalledProcessError: Command 'bbduk.sh in=results_assembly/SRR2038680/rmlst.fastq.gz out=results_assembly/SRR2038680/trimmed.fastq.gz qtrim=w trimq=20 k=25 minlength=50 forcetrimleft=15 ref=adapters overwrite hdist=1 tpe tbo threads=40' returned non-zero exit status 1.

Based on the supplied traceback, you are using an assesmbly for SRR2038680? I'll try to run that through ConFindr, and debug it on my end.

yes, I assembled with SKESA and run it with: confindr.py -i SRR2038680_assembly -o assembly_results --fasta -d path/to/database

I know it works with fastq files but I want to make sure that the functionality for fasta files works. If it does, I will have to check my environment. Would it be possible to confirm that the tool works with fasta files? Thank you!

It is supposed to work with FASTA files.
I've been trying to resolve your issue on my end. Making some progress.

It looks like the issue was due to a bug where if the -Xmx flag was not included, confindr was looking for the wrong input file. Version 0.7.4 has been uploaded to Bioconda, and it includes this fix. Additionally, you can include the -Xmx flag with an appropriate memory value to get it to work in version 0.7.3.

Please let me know if this addresses the issue.

Thank you for checking on this issue, the new version works for fasta files.