Issues setting up DB
Closed this issue · 9 comments
Dear Confindr team,
I’m trying to install database for confindr (v. 0.7.4, python=3.6), but I am running into some problems. Specifically, I am getting the following error:
...
2022-09-12 10:20:56 Downloading BACT000065...
2022-09-12 10:21:05 Downloading rMLST profiles...
2022-09-12 10:21:05 Combining rMLST files...
Traceback (most recent call last):
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr2/bin/confindr_database_setup", line 10, in <module>
sys.exit(main())
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr2/lib/python3.6/site-packages/confindr_src/database_setup.py", line 270, in main
args.secret_file)
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr2/lib/python3.6/site-packages/confindr_src/database_setup.py", line 209, in setup_confindr_database
record.seq._data = record.seq._data.replace('-', '').replace('N', '')
TypeError: a bytes-like object is required, not 'str'
The code I’m using to setup the db is :
confindr_database_setup -s key -o confindr_db
Have you seen this error before, or do you have any clues as to how to solve it? Thanks in advance
Hi @juliofdiaz,
This issue is the same as #27—you can either change your BioPython version to 1.68, or perform the manual fix as described in #27 (comment).
Edit: Previously referenced the wrong issue.
Thank you @pcrxn
It seems like the manual fix worked, but I was only able to download the Escherichia, Salmonella, and Listeria dbs. Here is how I am running the setup:
$ confindr_database_setup -s key -o confindr_db
2022-09-21 02:19:23 Downloading cgMLST-derived data for Escherichia, Salmonella, and Listeria...
Visit this URL in your browser: http://pubmlst.org/cgi-bin/bigsdb/bigsdb.pl?db=pubmlst_rmlst_seqdef&page=authorizeClient&oauth_token=QwTxHjRaQ3xC6wvPjA5TqdCtzA0FHK8H
Enter oauth_verifier from browser: QoauJved
2022-09-21 02:19:41 Downloading BACT000001...
2022-09-21 02:19:56 Downloading BACT000002...
2022-09-21 02:20:04 Downloading BACT000003...
2022-09-21 02:20:12 Downloading BACT000004...
2022-09-21 02:20:20 Downloading BACT000005...
2022-09-21 02:20:26 Downloading BACT000006...
2022-09-21 02:20:29 Downloading BACT000007...
2022-09-21 02:20:34 Downloading BACT000008...
2022-09-21 02:20:39 Downloading BACT000009...
2022-09-21 02:20:44 Downloading BACT000010...
2022-09-21 02:20:47 Downloading BACT000011...
2022-09-21 02:20:51 Downloading BACT000012...
2022-09-21 02:20:55 Downloading BACT000013...
2022-09-21 02:20:59 Downloading BACT000014...
2022-09-21 02:21:02 Downloading BACT000015...
2022-09-21 02:21:05 Downloading BACT000016...
2022-09-21 02:21:09 Downloading BACT000017...
2022-09-21 02:21:11 Downloading BACT000018...
2022-09-21 02:21:14 Downloading BACT000019...
2022-09-21 02:21:17 Downloading BACT000020...
2022-09-21 02:21:20 Downloading BACT000021...
2022-09-21 02:21:22 Downloading BACT000030...
2022-09-21 02:21:30 Downloading BACT000031...
2022-09-21 02:21:40 Downloading BACT000032...
2022-09-21 02:21:47 Downloading BACT000033...
2022-09-21 02:21:53 Downloading BACT000034...
2022-09-21 02:21:59 Downloading BACT000035...
2022-09-21 02:22:06 Downloading BACT000036...
2022-09-21 02:22:09 Downloading BACT000038...
2022-09-21 02:22:15 Downloading BACT000039...
2022-09-21 02:22:20 Downloading BACT000040...
2022-09-21 02:22:25 Downloading BACT000042...
2022-09-21 02:22:30 Downloading BACT000043...
2022-09-21 02:22:34 Downloading BACT000044...
2022-09-21 02:22:39 Downloading BACT000045...
2022-09-21 02:22:43 Downloading BACT000046...
2022-09-21 02:22:47 Downloading BACT000047...
2022-09-21 02:22:51 Downloading BACT000048...
2022-09-21 02:22:55 Downloading BACT000049...
2022-09-21 02:23:00 Downloading BACT000050...
2022-09-21 02:23:03 Downloading BACT000051...
2022-09-21 02:23:06 Downloading BACT000052...
2022-09-21 02:23:10 Downloading BACT000053...
2022-09-21 02:23:13 Downloading BACT000056...
2022-09-21 02:23:16 Downloading BACT000057...
2022-09-21 02:23:18 Downloading BACT000058...
2022-09-21 02:23:20 Downloading BACT000059...
2022-09-21 02:23:22 Downloading BACT000060...
2022-09-21 02:23:25 Downloading BACT000061...
2022-09-21 02:23:27 Downloading BACT000062...
2022-09-21 02:23:29 Downloading BACT000063...
2022-09-21 02:23:30 Downloading BACT000064...
2022-09-21 02:23:32 Downloading BACT000065...
2022-09-21 02:23:42 Downloading rMLST profiles...
2022-09-21 02:23:42 Combining rMLST files...
2022-09-21 02:25:34 Assigning alleles to genera...
2022-09-21 02:29:42 Downloading mash refseq sketch...
2022-09-21 02:29:43 Done downloading ConFindr databases!
$ ls confindr_db
Escherichia_db_cgderived.fasta Listeria_db_cgderived.fasta Salmonella_db_cgderived.fasta download_date.txt gene_allele.txt profiles.txt rMLST_combined.fasta refseq.msh
It seems I have only downloaded the Escherichia, Listeria, and Salmonella dbs. I am interested in the Mycobacterium one, and it seems I am registered for it:
I couldn't find additional information on downloading additional dbs, so I assume I'm missing something (probably obvious).
Hi @juliofdiaz,
Only alleles for Escherichia, Listeria, and Salmonella are downloaded by default. If you use ConFindr to analyze a sample of a different genus, alleles will be automatically downloaded for that other genus and saved in your database path, including for Mycobacterium.
Thank you Liam. I reran confindr, and it did download the Mycobacterium db (Mycobacterium_db.fasta
). Confindr did run into a problem trying to run bbduk.sh
. If this is not related to the original question, I can raise a different isuue.
2022-09-22 01:20:36 Welcome to ConFindr 0.7.4! Beginning analysis of your samples...
2022-09-22 01:20:36 Beginning analysis of sample DRR019435...
2022-09-22 01:20:36 Checking for cross-species contamination...
2022-09-22 01:22:41 Extracting conserved core genes...
2022-09-22 01:22:46 Encountered error when attempting to run ConFindr on sample DRR019435. Skipping...
2022-09-22 01:22:46 Error encounted was:
Traceback (most recent call last):
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/confindr.py", line 1067, in confindr
fasta=args.fasta)
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/confindr.py", line 638, in find_contamination
returncmd=True)
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/wrappers/bbtools.py", line 258, in bbduk_bait
out, err = run_subprocess(cmd)
File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess
raise subprocess.CalledProcessError(x.returncode, cmd=command)
subprocess.CalledProcessError: Command 'bbduk.sh in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2' returned non-zero exit status 1
2022-09-22 01:22:46 Contamination detection complete!
No problem, @juliofdiaz!
After you receive the bbduk.sh error, could you please run the following command and share the output?
bbduk.sh in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2
Here is my output:
(confindr3) [afk289@rescomp1 scripts]$ bbduk.sh in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2
java -ea -Xmx1500m -Xms1500m -cp /well/aanensen/users/afk289/conda/skylake/envs/confindr3/opt/bbmap-39.00-0/current/ jgi.BBDuk in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2
Executing jgi.BBDuk [in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz, in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz, outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz, outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz, ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta, Xmx=1500m, threads=2]
Version 39.00
Set threads to 2
0.187 seconds.
Initial:
Memory: max=1572m, total=1572m, free=1543m, used=29m
java.lang.Exception:
An input file appears to be misformatted:
The character with ASCII code 39 appeared where a base was expected: '''
Sequence #0
Sequence ID: 'BACT000001_159'
Sequence: '[65, 84, 71, 67, 67, 71, 65, 71, 84, 67, 67, 67, 65, 67, 67, 71, 84, 67, 65, 67, 67, 84, 67, 71, 67, 67, 71, 67, 65, 65, 71, 84, 65, 71, 67, 67, 71, 84, 67, 65, 65, 67, 71, 65, 67, 65, 84, 65, 71, 71, 67, 84, 67, 84, 65, 71, 67, 71, 65, 71, 71, 65, 67, 84, 84, 84, 67, 84, 67, 71, 67, 67, 71, 67, 65, 65, 84, 65, 71, 65, 67, 65, 65, 65, 65, 67, 71, 65, 84, 67, 65, 65, 71, 84, 65, 67, 84, 84, 67, 65, 65, 67, 71, 65, 84, 71, 71, 67, 71, 65, 67, 65, 84, 67, 71, 84, 67, 71, 65, 65, 71, 71, 67, 65, 67, 67, 65, 84, 67, 71, 84, 67, 65, 65, 65, 71, 84, 71, 71, 65, 67, 67, 71, 71, 71, 65, 67, 71, 65, 71, 71, 84, 71, 67, 84, 67, 67, 84, 67, 71, 65, 67, 65, 84, 67, 71, 71, 67, 84, 65, 67, 65, 65, 71, 65, 67, 67, 71, 65, 65, 71, 71, 67, 71, 84, 71, 65, 84, 67, 67, 67, 67, 71, 67, 67, 67, 71, 67, 71, 65, 65, 67, 84, 71, 84, 67, 67, 65, 84, 67, 65, 65, 71, 67, 65, 67, 71, 65, 67, 71, 84, 67, 71, 65, 67, 67, 67, 67, 65, 65, 67, 71, 65, 71, 71, 84, 67, 71, 84, 84, 84, 67, 67, 71, 84, 67, 71, 71, 84, 71, 65, 67, 71, 65, 71, 71, 84, 67, 71, 65, 65, 71, 67, 67, 67, 84, 71, 71, 84, 71, 67, 84, 67, 65, 67, 67, 65, 65, 71, 71, 65, 71, 71, 65, 67, 65, 65, 65, 71, 65, 71, 71, 71, 67, 67, 71, 71, 67, 84, 67, 65, 84, 67, 67, 84, 67, 84, 67, 67, 65, 65, 71, 65, 65, 65, 67, 71, 67, 71, 67, 71, 67, 65, 71, 84, 65, 67, 71, 65, 71, 67, 71, 84, 71, 67, 67, 84, 71, 71, 71, 71, 67, 65, 67, 67, 65, 84, 67, 71, 65, 71, 71, 67, 71, 67, 84, 67, 65, 65, 71, 71, 65, 71, 65, 65, 71, 71, 65, 67, 71, 65, 71, 71, 67, 67, 71, 84, 67, 65, 65, 71, 71, 71, 67, 65, 67, 71, 71, 84, 67, 65, 84, 67, 71, 65, 71, 71, 84, 67, 71, 84, 67, 65, 65, 71, 71, 71, 84, 71, 71, 67, 67, 84, 71, 65, 84, 67, 67, 84, 67, 71, 65, 67, 65, 84, 67, 71, 71, 71, 67, 84, 71, 67, 71, 67, 71, 71, 84, 84, 84, 67, 67, 84, 71, 67, 67, 67, 71, 67, 67, 84, 67, 71, 67, 84, 71, 71, 84, 71, 71, 65, 71, 65, 84, 71, 67, 71, 67, 67, 71, 71, 71, 84, 71, 67, 71, 67, 71, 65, 67, 67, 84, 71, 67, 65, 71, 67, 67, 67, 84, 65, 67, 65, 84, 67, 71, 71, 67, 65, 65, 71, 71, 65, 71, 65, 84, 67, 71, 65, 71, 71, 67, 67, 65, 65, 71, 65, 84, 67, 65, 84, 67, 71, 65, 71, 67, 84, 71, 71, 65, 67, 65, 65, 71, 65, 65, 67, 67, 71, 67, 65, 65, 67, 65, 65, 67, 71, 84, 71, 71, 84, 71, 67, 84, 71, 84, 67, 67, 67, 71, 84, 67, 71, 67, 71, 67, 67, 84, 71, 71, 67, 84, 71, 71, 65, 71, 67, 65, 71, 65, 67, 67, 67, 65, 71, 84, 67, 67, 71, 65, 71, 71, 84, 71, 67, 71, 67, 65, 71, 67, 71, 65, 71, 84, 84, 67, 67, 84, 71, 65, 65, 84, 65, 65, 67, 84, 84, 71, 67, 65, 65, 65, 65, 65, 71, 71, 67, 65, 67, 67, 65, 84, 67, 67, 71, 65, 65, 65, 71, 71, 71, 84, 71, 84, 67, 71, 84, 71, 84, 67, 67, 84, 67, 71, 65, 84, 67, 71, 84, 67, 65, 65, 67, 84, 84, 67, 71, 71, 67, 71, 67, 71, 84, 84, 67, 71, 84, 67, 71, 65, 84, 67, 84, 67, 71, 71, 67, 71, 71, 84, 71, 84, 71, 71, 65, 67, 71, 71, 84, 67, 84, 71, 71, 84, 71, 67, 65, 84, 71, 84, 67, 84, 67, 67, 71, 65, 71, 67, 84, 65, 84, 67, 71, 84, 71, 71, 65, 65, 71, 67, 65, 67, 65, 84, 67, 71, 65, 67, 67, 65, 67, 67, 67, 71, 84, 67, 67, 71, 65, 71, 71, 84, 71, 71, 84, 67, 67, 65, 71, 71, 84, 84, 71, 71, 84, 71, 65, 67, 71, 65, 71, 71, 84, 67, 65, 67, 67, 71, 84, 67, 71, 65, 71, 71, 84, 71, 67, 84, 67, 71, 65, 67, 71, 84, 67, 71, 65, 67, 65, 84, 71, 71, 65, 67, 67, 71, 84, 71, 65, 71, 67, 71, 71, 71, 84, 84, 84, 67, 71, 84, 84, 71, 84, 67, 65, 67, 84, 67, 65, 65, 71, 71, 67, 71, 65, 67, 84, 67, 65, 71, 71, 65, 65, 71, 65, 67, 67, 67, 71, 84, 71, 71, 67, 71, 71, 67, 65, 67, 84, 84, 67, 71, 67, 67, 67, 71, 67, 65, 67, 84, 67, 65, 67, 71, 67, 71, 65, 84, 67, 71, 71, 71, 67, 65, 71, 65, 84, 67, 71, 84, 71, 67, 67, 71, 71, 71, 67, 65, 65, 71, 71, 84, 67, 65, 67, 67, 65, 65, 71, 84, 84, 71, 71, 84, 84, 67, 67, 71, 84, 84, 67, 71, 71, 84, 71, 67, 65, 84, 84, 67, 71, 84, 67, 67, 71, 67, 71, 84, 67, 71, 65, 71, 71, 65, 71, 71, 71, 84, 65, 84, 67, 71, 65, 71, 71, 71, 67, 67, 84, 71, 71, 84, 71, 67, 65, 67, 65, 84, 67, 84, 67, 67, 71, 65, 71, 67, 84, 71, 71, 67, 67, 71, 65, 71, 67, 71, 84, 67, 65, 67, 71, 84, 67, 71, 65, 71, 71, 84, 71, 67, 67, 67, 71, 65, 84, 67, 65, 71, 71, 84, 71, 71, 84, 84, 71, 67, 67, 71, 84, 67, 71, 71, 67, 71, 65, 67, 71, 65, 67, 71, 67, 71, 65, 84, 71, 71, 84, 67, 65, 65, 71, 71, 84, 67, 65, 84, 67, 71, 65, 67, 65, 84, 67, 71, 65, 67, 67, 84, 71, 71, 65, 71, 67, 71, 67, 67, 71, 84, 67, 71, 71, 65, 84, 67, 84, 67, 71, 84, 84, 71, 84, 67, 71, 67, 84, 67, 65, 65, 71, 67, 65, 65, 71, 67, 67, 65, 65, 84, 71, 65, 71, 71, 65, 67, 84, 65, 67, 65, 67, 67, 71, 65, 71, 71, 65, 71, 84, 84, 67, 71, 65, 67, 67, 67, 71, 71, 67, 71, 65, 65, 71, 84, 65, 67, 71, 71, 67, 65, 84, 71, 71, 67, 67, 71, 65, 67, 65, 71, 84, 84, 65, 67, 71, 65, 67, 71, 65, 71, 67, 65, 71, 71, 71, 67, 65, 65, 67, 84, 65, 67, 65, 84, 67, 84, 84, 67, 67, 67, 67, 71, 65, 71, 71, 71, 67, 84, 84, 67, 71, 65, 84, 71, 67, 67, 71, 65, 65, 65, 67, 67, 65, 65, 67, 71, 65, 65, 84, 71, 71, 67, 84, 84, 71, 65, 71, 71, 71, 65, 84, 84, 67, 71, 65, 65, 65, 65, 71, 67, 65, 71, 67, 71, 67, 71, 67, 67, 71, 65, 65, 84, 71, 71, 71, 65, 65, 71, 67, 84, 67, 71, 71, 84, 65, 67, 71, 67, 67, 71, 65, 71, 71, 67, 67, 71, 65, 71, 67, 71, 67, 67, 71, 71, 67, 65, 67, 65, 65, 71, 65, 84, 71, 67, 65, 67, 65, 67, 67, 71, 67, 71, 67, 65, 71, 65, 84, 71, 71, 65, 71, 65, 65, 71, 84, 84, 67, 71, 67, 67, 71, 67, 67, 71, 67, 67, 71, 65, 71, 71, 67, 71, 71, 67, 84, 71, 71, 65, 67, 71, 67, 71, 71, 67, 71, 67, 71, 71, 65, 67, 71, 65, 84, 67, 65, 71, 84, 67, 71, 84, 67, 71, 71, 67, 67, 65, 71, 84, 65, 71, 67, 71, 67, 65, 67, 67, 71, 84, 67, 71, 71, 65, 65, 65, 65, 71, 65, 67, 67, 71, 67, 71, 71, 71, 84, 71, 71, 65, 84, 67, 65, 67, 84, 71, 71, 67, 67, 65, 71, 67, 71, 65, 67, 71, 67, 67, 67, 65, 71, 67, 84, 71, 71, 67, 71, 71, 67, 67, 67, 84, 71, 67, 71, 71, 71, 65, 65, 65, 65, 65, 67, 84, 67, 71, 67, 67, 71, 71, 67, 65, 71, 67, 71, 67, 84, 84, 71, 65, 39]
ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCGAGGACTTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCGAAGGCACCATCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCGAAGGCGTGATCCCCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTCGTTTCCGTCGGTGACGAGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGCTCATCCTCTCCAAGAAACGCGCGCAGTACGAGCGTGCCTGGGGCACCATCGAGGCGCTCAAGGAGAAGGACGAGGCCGTCAAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCCTCGACATCGGGCTGCGCGGTTTCCTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCGACCTGCAGCCCTACATCGGCAAGGAGATCGAGGCCAAGATCATCGAGCTGGACAAGAACCGCAACAACGTGGTGCTGTCCCGTCGCGCCTGGCTGGAGCAGACCCAGTCCGAGGTGCGCAGCGAGTTCCTGAATAACTTGCAAAAAGGCACCATCCGAAAGGGTGTCGTGTCCTCGATCGTCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGTGGACGGTCTGGTGCATGTCTCCGAGCTATCGTGGAAGCACATCGACCACCCGTCCGAGGTGGTCCAGGTTGGTGACGAGGTCACCGTCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTCGTTGTCACTCAAGGCGACTCAGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGGGCAGATCGTGCCGGGCAAGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGAGGGTATCGAGGGCCTGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATCAGGTGGTTGCCGTCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTCGGATCTCGTTGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGTACGGCATGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCCGAGGGCTTCGATGCCGAAACCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCGAGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGGCGGCTGGACGCGGCGCGGACGATCAGTCGTCGGCCAGTAGCGCACCGTCGGAAAAGACCGCGGGTGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCAGCGCTTGA''
This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk'
at shared.KillSwitch.kill(KillSwitch.java:96)
at stream.Read.validateCommonCase_branchless(Read.java:412)
at stream.Read.validate(Read.java:115)
at stream.Read.<init>(Read.java:77)
at stream.Read.<init>(Read.java:50)
at stream.FastaReadInputStream.generateRead(FastaReadInputStream.java:270)
at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:184)
at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:109)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:668)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:657)
For some reason, sequence BACT000001_159 incldes a b'
at the begining and a '
at the end:
b'ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCG
AGGACTTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCG
AAGGCACCATCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCG
AAGGCGTGATCCCCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTCG
TTTCCGTCGGTGACGAGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGC
TCATCCTCTCCAAGAAACGCGCGCAGTACGAGCGTGCCTGGGGCACCATCGAGGCGCTCA
AGGAGAAGGACGAGGCCGTCAAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCC
TCGACATCGGGCTGCGCGGTTTCCTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCG
ACCTGCAGCCCTACATCGGCAAGGAGATCGAGGCCAAGATCATCGAGCTGGACAAGAACC
GCAACAACGTGGTGCTGTCCCGTCGCGCCTGGCTGGAGCAGACCCAGTCCGAGGTGCGCA
GCGAGTTCCTGAATAACTTGCAAAAAGGCACCATCCGAAAGGGTGTCGTGTCCTCGATCG
TCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGTGGACGGTCTGGTGCATGTCTCCGAGC
TATCGTGGAAGCACATCGACCACCCGTCCGAGGTGGTCCAGGTTGGTGACGAGGTCACCG
TCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTCGTTGTCACTCAAGGCGACTC
AGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGGGCAGATCGTGCCGGGCA
AGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGAGGGTATCGAGGGCC
TGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATCAGGTGGTTGCCG
TCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTCGGATCTCGT
TGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGTACGGCA
TGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCCGAGGGCTTCGATGCCGAAA
CCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCG
AGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGG
CGGCTGGACGCGGCGCGGACGATCAGTCGTCGGCCAGTAGCGCACCGTCGGAAAAGACCG
CGGGTGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCA
GCGCTTGA'```
I tried this settings as mentioned in post #30, but the outcome was the same as the above:
(confindr3) [afk289@rescomp1 confindr_db]$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import Bio
>>> print(Bio.__version__)
1.78
Hi @juliofdiaz, after you changed your BioPython version to 1.78, did you delete the old rMLST files and re-download?
Deleting the old rMLST files and re-downloading did the trick. Here is how I set up the conda environment:
conda create -n confindr python=3.7.12
conda activate confindr
conda install -c conda-forge biopython=1.78
conda install confindr=0.7.4