OLC-Bioinformatics/ConFindr

bbtools error

Closed this issue · 3 comments

Hi, I'm trying to run confindr on some Klebsiella fastq files but am having some problems with bbtools.

I started out using Confindr=0.7.4, mash=2.3 and Python 3.7.12.

Some things I have tried that haven't worked:
- ammending line 209 of database_setup.py (this worked)
- downgrading from biopython 1.79 to 1.78
- trimming fastq files with trimmomatic first
- changing bbmap to = 38.91
- changing Klebsiella_db.fasta (details below) but now am having a problem with indexing

(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 original_files]$ confindr.py -i /home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files -o output_1 --rmlst
2022-06-27 11:03:41 Welcome to ConFindr 0.7.4! Beginning analysis of your samples...
2022-06-27 11:03:41 Beginning analysis of sample NK_H18_033_1...
2022-06-27 11:03:41 Checking for cross-species contamination...
2022-06-27 11:03:47 Extracting conserved core genes...
2022-06-27 11:03:49 Encountered error when attempting to run ConFindr on sample NK_H18_033_1. Skipping...
2022-06-27 11:03:49 Error encounted was:
Traceback (most recent call last):
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1067, in confindr
fasta=args.fasta)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 647, in find_contamination
returncmd=True, threads=threads)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 258, in bbduk_bait
out, err = run_subprocess(cmd)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess
raise subprocess.CalledProcessError(x.returncode, cmd=command)
subprocess.CalledProcessError: Command 'bbduk.sh in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz outm=output_1/NK_H18_033_1/rmlst.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta threads=36' returned non-zero exit status 1.

2022-06-27 11:03:49 Beginning analysis of sample NK_H18_033_2...
2022-06-27 11:03:49 Checking for cross-species contamination...
2022-06-27 11:03:55 Extracting conserved core genes...
2022-06-27 11:03:55 Encountered error when attempting to run ConFindr on sample NK_H18_033_2. Skipping...
2022-06-27 11:03:55 Error encounted was:
Traceback (most recent call last):
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1067, in confindr
fasta=args.fasta)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 647, in find_contamination
returncmd=True, threads=threads)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 258, in bbduk_bait
out, err = run_subprocess(cmd)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess
raise subprocess.CalledProcessError(x.returncode, cmd=command)
subprocess.CalledProcessError: Command 'bbduk.sh in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz outm=output_1/NK_H18_033_2/rmlst.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta threads=36' returned non-zero exit status 1.

2022-06-27 11:03:55 Contamination detection complete!
(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 original_files]$

(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn3 original_files]$ bbduk.sh in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz in2=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz outm=output_1/NK_H18_033_2/rmlst_R1.fastq.gz outm2=output_1/NK_H18_033_2/rmlst_R2.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta
java -ea -Xmx29615m -Xms29615m -cp /projects/js66/software/conda_envs/confindr_0.7.4/opt/bbmap-38.96-1/current/ jgi.BBDuk in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz in2=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz outm=output_1/NK_H18_033_2/rmlst_R1.fastq.gz outm2=output_1/NK_H18_033_2/rmlst_R2.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta
Executing jgi.BBDuk [in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz, in2=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz, outm=output_1/NK_H18_033_2/rmlst_R1.fastq.gz, outm2=output_1/NK_H18_033_2/rmlst_R2.fastq.gz, ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta]
Version 38.96

0.288 seconds.
Initial:
Memory: max=31054m, total=31054m, free=31025m, used=29m

java.lang.Exception:
An input file appears to be misformatted:
The character with ASCII code 39 appeared where a base was expected: '''
Sequence #0
Sequence ID: 'BACT000001_2176'
Sequence: '[98, 39, 65, 84, 71, 65, 67, 84, 71, 65, 65, 84, 67, 84, 84, 84, 84, 71, 67, 84, 67, 65, 65, 67, 84, 71, 84, 84, 84, 71, 65, 65, 71, 65, 65, 84, 67, 67, 84, 84, 65, 65, 65, 65, 71, 65, 65, 65, 84, 67, 71, 65, 65, 65, 67, 67, 67, 71, 67, 67, 67, 71, 71, 71, 84, 84, 67, 67, 65, 84, 67, 71, 84, 84, 67, 71, 84, 71, 71, 84, 71, 84, 84, 71, 84, 84, 71, 84, 84, 71, 67, 84, 65, 84, 67, 71, 65, 67, 65, 65, 65, 71, 65, 67, 71, 84, 65, 71, 84, 65, 67, 84, 71, 71, 84, 84, 71, 65, 67, 71, 67, 67, 71, 71, 84, 67, 84, 71, 65, 65, 65, 84, 67, 84, 71, 65, 71, 84, 67, 67, 71, 67, 67, 65, 84, 67, 67, 67, 71, 71, 67, 84, 71, 65, 71, 67, 65, 71, 84, 84, 67, 65, 65, 65, 65, 65, 67, 71, 67, 67, 67, 65, 71, 71, 71, 67, 71, 65, 71, 67, 84, 71, 71, 65, 65, 65, 84, 67, 67, 65, 71, 71, 84, 84, 71, 71, 84, 71, 65, 67, 71, 65, 65, 71, 84, 84, 71, 65, 67, 71, 84, 84, 71, 67, 84, 67, 84, 71, 71, 65, 84, 71, 67, 65, 71, 84, 65, 71, 65, 65, 71, 65, 67, 71, 71, 67, 84, 84, 67, 71, 71, 84, 71, 65, 65, 65, 67, 84, 67, 84, 71, 67, 84, 71, 84, 67, 67, 67, 71, 84, 71, 65, 71, 65, 65, 65, 71, 67, 84, 65, 65, 65, 67, 71, 84, 67, 65, 67, 71, 65, 65, 71, 67, 84, 84, 71, 71, 65, 84, 67, 65, 67, 71, 67, 84, 71, 71, 65, 65, 65, 65, 65, 71, 67, 84, 84, 65, 67, 71, 65, 65, 71, 65, 67, 71, 67, 84, 71, 65, 65, 65, 67, 84, 71, 84, 84, 65, 67, 67, 71, 71, 84, 71, 84, 84, 65, 84, 67, 65, 65, 67, 71, 71, 67, 65, 65, 65, 71, 84, 84, 65, 65, 65, 71, 71, 84, 71, 71, 67, 84, 84, 67, 65, 67, 84, 71, 84, 84, 71, 65, 71, 67, 84, 71, 65, 65, 67, 71, 71, 84, 65, 84, 84, 67, 71, 84, 71, 67, 71, 84, 84, 67, 67, 84, 71, 67, 67, 71, 71, 71, 84, 84, 67, 67, 67, 84, 71, 71, 84, 65, 71, 65, 67, 71, 84, 84, 67, 71, 84, 67, 67, 71, 71, 84, 71, 67, 71, 67, 71, 65, 67, 65, 67, 71, 67, 84, 71, 67, 65, 67, 67, 84, 71, 71, 65, 65, 71, 71, 67, 65, 65, 65, 71, 65, 71, 67, 84, 84, 71, 65, 71, 84, 84, 67, 65, 65, 65, 71, 84, 67, 65, 84, 67, 65, 65, 71, 67, 84, 71, 71, 65, 67, 67, 65, 71, 65, 65, 71, 67, 71, 84, 65, 65, 67, 65, 65, 67, 71, 84, 84, 71, 84, 84, 71, 84, 84, 84, 67, 84, 67, 71, 84, 67, 71, 84, 71, 67, 67, 71, 84, 84, 65, 84, 67, 71, 65, 65, 84, 67, 67, 71, 65, 65, 65, 65, 67, 65, 71, 67, 71, 67, 65, 71, 65, 71, 67, 71, 67, 71, 65, 84, 67, 65, 71, 67, 84, 71, 67, 84, 71, 71, 65, 65, 65, 65, 67, 67, 84, 71, 67, 65, 71, 71, 65, 65, 71, 71, 67, 65, 84, 71, 71, 65, 65, 71, 84, 67, 65, 65, 65, 71, 71, 84, 65, 84, 67, 71, 84, 84, 65, 65, 71, 65, 65, 67, 67, 84, 67, 65, 67, 84, 71, 65, 67, 84, 65, 67, 71, 71, 84, 71, 67, 65, 84, 84, 67, 71, 84, 84, 71, 65, 84, 67, 84, 71, 71, 71, 67, 71, 71, 67, 71, 84, 84, 71, 65, 67, 71, 71, 67, 67, 84, 71, 67, 84, 71, 67, 65, 67, 65, 84, 67, 65, 67, 67, 71, 65, 84, 65, 84, 71, 71, 67, 67, 84, 71, 71, 65, 65, 65, 67, 71, 67, 71, 84, 84, 65, 65, 71, 67, 65, 84, 67, 67, 71, 65, 71, 67, 71, 65, 65, 65, 84, 67, 71, 84, 65, 65, 65, 67, 71, 84, 84, 71, 71, 67, 71, 65, 67, 71, 65, 65, 65, 84, 67, 65, 67, 84, 71, 84, 84, 65, 65, 65, 71, 84, 71, 67, 84, 71, 65, 65, 71, 84, 84, 67, 71, 65, 67, 67, 71, 67, 71, 65, 71, 67, 71, 84, 65, 67, 67, 67, 71, 84, 71, 84, 65, 84, 67, 67, 67, 84, 71, 71, 71, 67, 67, 84, 71, 65, 65, 65, 67, 65, 71, 67, 84, 71, 71, 71, 67, 71, 65, 65, 71, 65, 84, 67, 67, 65, 84, 71, 71, 71, 84, 65, 71, 67, 84, 65, 84, 67, 71, 67, 84, 65, 65, 65, 67, 71, 84, 84, 65, 84, 67, 67, 71, 71, 65, 65, 71, 71, 84, 65, 67, 67, 65, 65, 65, 67, 84, 71, 65, 67, 67, 71, 71, 84, 67, 71, 67, 71, 84, 71, 65, 67, 67, 65, 65, 67, 67, 84, 71, 65, 67, 67, 71, 65, 67, 84, 65, 67, 71, 71, 67, 84, 71, 67, 84, 84, 67, 71, 84, 84, 71, 65, 65, 65, 84, 67, 71, 65, 65, 71, 65, 65, 71, 71, 67, 71, 84, 84, 71, 65, 65, 71, 71, 67, 67, 84, 71, 71, 84, 84, 67, 65, 67, 71, 84, 84, 84, 67, 67, 71, 65, 65, 65, 84, 71, 71, 65, 84, 84, 71, 71, 65, 67, 67, 65, 65, 67, 65, 65, 65, 65, 65, 67, 65, 84, 67, 67, 65, 67, 67, 67, 71, 84, 67, 67, 65, 65, 65, 71, 84, 84, 71, 84, 84, 65, 65, 67, 71, 84, 84, 71, 71, 67, 71, 65, 67, 71, 84, 84, 71, 84, 71, 71, 65, 65, 71, 84, 71, 65, 84, 71, 71, 84, 84, 67, 84, 71, 71, 65, 84, 65, 84, 67, 71, 65, 67, 71, 65, 65, 71, 65, 71, 67, 71, 84, 67, 71, 84, 67, 71, 84, 65, 84, 67, 84, 67, 67, 67, 84, 71, 71, 71, 84, 67, 84, 71, 65, 65, 71, 67, 65, 71, 84, 71, 67, 65, 65, 65, 84, 67, 84, 65, 65, 67, 67, 67, 65, 84, 71, 71, 67, 65, 71, 67, 65, 71, 84, 84, 67, 71, 67, 71, 71, 65, 65, 65, 67, 67, 67, 65, 67, 65, 65, 67, 65, 65, 71, 71, 71, 67, 71, 65, 67, 67, 71, 84, 71, 84, 84, 71, 65, 65, 71, 71, 84, 65, 65, 65, 65, 84, 67, 65, 65, 71, 84, 67, 84, 65, 84, 67, 65, 67, 84, 71, 65, 67, 84, 84, 67, 71, 71, 84, 65, 84, 67, 84, 84, 67, 65, 84, 67, 71, 71, 67, 67, 84, 71, 71, 65, 67, 71, 71, 67, 71, 71, 67, 65, 84, 67, 71, 65, 67, 71, 71, 67, 67, 84, 71, 71, 84, 84, 67, 65, 67, 67, 84, 71, 84, 67, 84, 71, 65, 67, 65, 84, 67, 84, 67, 67, 84, 71, 71, 65, 65, 67, 71, 84, 84, 71, 67, 65, 71, 71, 67, 71, 65, 65, 71, 65, 65, 71, 67, 65, 71, 84, 84, 67, 71, 84, 71, 65, 65, 84, 65, 67, 65, 65, 65, 65, 65, 65, 71, 71, 67, 71, 65, 67, 71, 65, 65, 65, 84, 67, 71, 67, 65, 71, 67, 65, 71, 84, 84, 71, 84, 84, 67, 84, 71, 67, 65, 71, 71, 84, 84, 71, 65, 67, 71, 67, 65, 71, 65, 71, 67, 71, 84, 71, 65, 71, 67, 71, 84, 65, 84, 67, 84, 67, 67, 67, 84, 71, 71, 71, 67, 71, 84, 84, 65, 65, 65, 67, 65, 71, 67, 84, 67, 71, 67, 71, 71, 65, 65, 71, 65, 84, 67, 67, 71, 84, 84, 67, 65, 65, 67, 65, 65, 67, 84, 65, 67, 71, 84, 84, 71, 67, 84, 67, 84, 71, 65, 65, 67, 65, 65, 71, 65, 65, 65, 71, 71, 67, 71, 67, 84, 65, 84, 67, 71, 84, 84, 71, 84, 84, 71, 71, 84, 65, 65, 65, 71, 84, 67, 65, 67, 84, 71, 67, 65, 71, 84, 84, 71, 65, 67, 71, 67, 84, 65, 65, 65, 71, 71, 67, 71, 67, 65, 65, 67, 67, 71, 84, 65, 71, 65, 65, 67, 84, 71, 71, 67, 84, 71, 65, 67, 71, 71, 67, 71, 84, 65, 71, 65, 65, 71, 71, 84, 84, 65, 67, 67, 84, 71, 67, 71, 84, 71, 67, 84, 84, 67, 84, 71, 65, 65, 71, 67, 65, 84, 67, 67, 67, 71, 84, 71, 65, 67, 67, 71, 67, 71, 84, 84, 71, 65, 65, 71, 65, 67, 71, 67, 65, 65, 67, 84, 67, 84, 71, 71, 84, 84, 67, 84, 71, 65, 71, 67, 71, 84, 84, 71, 71, 67, 71, 65, 67, 71, 65, 65, 71, 84, 84, 71, 65, 65, 71, 67, 71, 65, 65, 65, 84, 84, 67, 65, 67, 67, 71, 71, 67, 71, 84, 71, 71, 65, 84, 67, 71, 84, 65, 65, 71, 65, 65, 67, 67, 71, 67, 71, 84, 65, 71, 84, 71, 65, 71, 67, 67, 84, 71, 84, 67, 84, 71, 84, 65, 67, 71, 84, 71, 67, 71, 65, 65, 65, 71, 65, 67, 71, 65, 65, 71, 67, 71, 71, 65, 65, 71, 65, 65, 65, 65, 65, 71, 65, 67, 71, 67, 84, 65, 84, 67, 71, 67, 84, 65, 67, 67, 71, 84, 71, 65, 65, 67, 65, 65, 71, 67, 65, 71, 71, 65, 65, 71, 65, 67, 71, 67, 71, 65, 65, 67, 84, 84, 67, 84, 67, 67, 65, 65, 67, 65, 65, 67, 71, 67, 84, 65, 84, 71, 71, 67, 84, 71, 65, 65, 71, 67, 71, 84, 84, 67, 65, 65, 65, 71, 67, 65, 71, 67, 71, 65, 65, 65, 71, 71, 67, 71, 65, 71, 84, 65, 65, 39]
b'ATGACTGAATCTTTTGCTCAACTGTTTGAAGAATCCTTAAAAGAAATCGAAACCCGCCCGGGTTCCATCGTTCGTGGTGTTGTTGTTGCTATCGACAAAGACGTAGTACTGGTTGACGCCGGTCTGAAATCTGAGTCCGCCATCCCGGCTGAGCAGTTCAAAAACGCCCAGGGCGAGCTGGAAATCCAGGTTGGTGACGAAGTTGACGTTGCTCTGGATGCAGTAGAAGACGGCTTCGGTGAAACTCTGCTGTCCCGTGAGAAAGCTAAACGTCACGAAGCTTGGATCACGCTGGAAAAAGCTTACGAAGACGCTGAAACTGTTACCGGTGTTATCAACGGCAAAGTTAAAGGTGGCTTCACTGTTGAGCTGAACGGTATTCGTGCGTTCCTGCCGGGTTCCCTGGTAGACGTTCGTCCGGTGCGCGACACGCTGCACCTGGAAGGCAAAGAGCTTGAGTTCAAAGTCATCAAGCTGGACCAGAAGCGTAACAACGTTGTTGTTTCTCGTCGTGCCGTTATCGAATCCGAAAACAGCGCAGAGCGCGATCAGCTGCTGGAAAACCTGCAGGAAGGCATGGAAGTCAAAGGTATCGTTAAGAACCTCACTGACTACGGTGCATTCGTTGATCTGGGCGGCGTTGACGGCCTGCTGCACATCACCGATATGGCCTGGAAACGCGTTAAGCATCCGAGCGAAATCGTAAACGTTGGCGACGAAATCACTGTTAAAGTGCTGAAGTTCGACCGCGAGCGTACCCGTGTATCCCTGGGCCTGAAACAGCTGGGCGAAGATCCATGGGTAGCTATCGCTAAACGTTATCCGGAAGGTACCAAACTGACCGGTCGCGTGACCAACCTGACCGACTACGGCTGCTTCGTTGAAATCGAAGAAGGCGTTGAAGGCCTGGTTCACGTTTCCGAAATGGATTGGACCAACAAAAACATCCACCCGTCCAAAGTTGTTAACGTTGGCGACGTTGTGGAAGTGATGGTTCTGGATATCGACGAAGAGCGTCGTCGTATCTCCCTGGGTCTGAAGCAGTGCAAATCTAACCCATGGCAGCAGTTCGCGGAAACCCACAACAAGGGCGACCGTGTTGAAGGTAAAATCAAGTCTATCACTGACTTCGGTATCTTCATCGGCCTGGACGGCGGCATCGACGGCCTGGTTCACCTGTCTGACATCTCCTGGAACGTTGCAGGCGAAGAAGCAGTTCGTGAATACAAAAAAGGCGACGAAATCGCAGCAGTTGTTCTGCAGGTTGACGCAGAGCGTGAGCGTATCTCCCTGGGCGTTAAACAGCTCGCGGAAGATCCGTTCAACAACTACGTTGCTCTGAACAAGAAAGGCGCTATCGTTGTTGGTAAAGTCACTGCAGTTGACGCTAAAGGCGCAACCGTAGAACTGGCTGACGGCGTAGAAGGTTACCTGCGTGCTTCTGAAGCATCCCGTGACCGCGTTGAAGACGCAACTCTGGTTCTGAGCGTTGGCGACGAAGTTGAAGCGAAATTCACCGGCGTGGATCGTAAGAACCGCGTAGTGAGCCTGTCTGTACGTGCGAAAGACGAAGCGGAAGAAAAAGACGCTATCGCTACCGTGAACAAGCAGGAAGACGCGAACTTCTCCAACAACGCTATGGCTGAAGCGTTCAAAGCAGCGAAAGGCGAGTAA''

This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk'
at shared.KillSwitch.kill(KillSwitch.java:96)
at stream.Read.validateCommonCase_branchless(Read.java:412)
at stream.Read.validate(Read.java:115)
at stream.Read.(Read.java:77)
at stream.Read.(Read.java:50)
at stream.FastaReadInputStream.generateRead(FastaReadInputStream.java:270)
at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:184)
at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:109)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:668)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:657)

(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 original_files]$ cd /home/acorreia/.confindr_db/
(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 .confindr_db]$ ls
download_date.txt gene_allele.txt Listeria_db_cgderived.fasta refseq.msh Salmonella_db_cgderived.fasta
Escherichia_db_cgderived.fasta Klebsiella_db.fasta profiles.txt rMLST_combined.fasta
(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 .confindr_db]$ cat Klebsiella_db.fasta | head -30

BACT000001_2176
b'ATGACTGAATCTTTTGCTCAACTGTTTGAAGAATCCTTAAAAGAAATCGAAACCCGCC
CGGGTTCCATCGTTCGTGGTGTTGTTGTTGCTATCGACAAAGACGTAGTACTGGTTGACG
CCGGTCTGAAATCTGAGTCCGCCATCCCGGCTGAGCAGTTCAAAAACGCCCAGGGCGAGC
TGGAAATCCAGGTTGGTGACGAAGTTGACGTTGCTCTGGATGCAGTAGAAGACGGCTTCG
GTGAAACTCTGCTGTCCCGTGAGAAAGCTAAACGTCACGAAGCTTGGATCACGCTGGAAA
AAGCTTACGAAGACGCTGAAACTGTTACCGGTGTTATCAACGGCAAAGTTAAAGGTGGCT
TCACTGTTGAGCTGAACGGTATTCGTGCGTTCCTGCCGGGTTCCCTGGTAGACGTTCGTC
CGGTGCGCGACACGCTGCACCTGGAAGGCAAAGAGCTTGAGTTCAAAGTCATCAAGCTGG
ACCAGAAGCGTAACAACGTTGTTGTTTCTCGTCGTGCCGTTATCGAATCCGAAAACAGCG
CAGAGCGCGATCAGCTGCTGGAAAACCTGCAGGAAGGCATGGAAGTCAAAGGTATCGTTA
AGAACCTCACTGACTACGGTGCATTCGTTGATCTGGGCGGCGTTGACGGCCTGCTGCACA
TCACCGATATGGCCTGGAAACGCGTTAAGCATCCGAGCGAAATCGTAAACGTTGGCGACG
AAATCACTGTTAAAGTGCTGAAGTTCGACCGCGAGCGTACCCGTGTATCCCTGGGCCTGA
AACAGCTGGGCGAAGATCCATGGGTAGCTATCGCTAAACGTTATCCGGAAGGTACCAAAC
TGACCGGTCGCGTGACCAACCTGACCGACTACGGCTGCTTCGTTGAAATCGAAGAAGGCG
TTGAAGGCCTGGTTCACGTTTCCGAAATGGATTGGACCAACAAAAACATCCACCCGTCCA
AAGTTGTTAACGTTGGCGACGTTGTGGAAGTGATGGTTCTGGATATCGACGAAGAGCGTC
GTCGTATCTCCCTGGGTCTGAAGCAGTGCAAATCTAACCCATGGCAGCAGTTCGCGGAAA
CCCACAACAAGGGCGACCGTGTTGAAGGTAAAATCAAGTCTATCACTGACTTCGGTATCT
TCATCGGCCTGGACGGCGGCATCGACGGCCTGGTTCACCTGTCTGACATCTCCTGGAACG
TTGCAGGCGAAGAAGCAGTTCGTGAATACAAAAAAGGCGACGAAATCGCAGCAGTTGTTC
TGCAGGTTGACGCAGAGCGTGAGCGTATCTCCCTGGGCGTTAAACAGCTCGCGGAAGATC
CGTTCAACAACTACGTTGCTCTGAACAAGAAAGGCGCTATCGTTGTTGGTAAAGTCACTG
CAGTTGACGCTAAAGGCGCAACCGTAGAACTGGCTGACGGCGTAGAAGGTTACCTGCGTG
CTTCTGAAGCATCCCGTGACCGCGTTGAAGACGCAACTCTGGTTCTGAGCGTTGGCGACG
AAGTTGAAGCGAAATTCACCGGCGTGGATCGTAAGAACCGCGTAGTGAGCCTGTCTGTAC
GTGCGAAAGACGAAGCGGAAGAAAAAGACGCTATCGCTACCGTGAACAAGCAGGAAGACG
CGAACTTCTCCAACAACGCTATGGCTGAAGCGTTCAAAGCAGCGAAAGGCGAGTAA'
BACT000002_86
(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 .confindr_db]$

I then tried to remove the the the b' and ' at the beginning and end of each sequence but the error below happened:

confindr.py -i /home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files -o example-out --rmlst
2022-06-28 03:40:45 Welcome to ConFindr 0.7.4! Beginning analysis of your samples...
2022-06-28 03:40:45 Beginning analysis of sample NK_H18_033_1...
2022-06-28 03:40:45 Checking for cross-species contamination...
2022-06-28 03:40:51 Extracting conserved core genes...
2022-06-28 03:40:58 Quality trimming...
2022-06-28 03:40:59 Detecting contamination...
[E::fai_build_core] Different line length in sequence 'BACT000001_2176'
Traceback (most recent call last):
File "/projects/js66/software/conda_envs/confindr_0.7.4/bin/confindr.py", line 10, in
sys.exit(main())
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1214, in main
confindr(args)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1067, in confindr
fasta=args.fasta)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 691, in find_contamination
pysam.faidx(sample_database)
File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/pysam/utils.py", line 75, in call
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=[faidx] Could not build fai index /home/acorreia/.confindr_db/Klebsiella_db.fasta.fai\n'

If you have any advice please, that would be great.

pcrxn commented

Hi @annacorreia—sorry for the delayed response! I've tested some Klebsiella genomes using ConFindr v0.7.4, with mash=2.3 and python=3.7.12, and the rMLST alleles for Klebsiella seem to be downloaded correctly and ConFindr runs without error.

Based upon the logs that you've included, it seems that the encoding of the Klebsiella rMLST alleles file (Klebsiella_db.fasta) became messed up somehow: if you delete all of the files Klebsiella_db* within the path provided to -d/--databases, and then re-run ConFindr, the Klebsiella alleles will be downloaded again and automatically re-indexed.

pcrxn commented

Related to #30, issue with the biopython version. If your biopython version has been downgraded to 1.78, the above instructions should work!

pcrxn commented

Closing as completed since a solution was provided.