No taxid in taxRanks
Opened this issue · 1 comments
aaguinal commented
I've run into this issue using PanGIA command line. Logs are attached:
[00:00:00] Starting PanGIA 1.0.0-RC6.1
[00:00:00] Arguments and dependencies checked:
[00:00:00] Input reads : ['/srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq']
[00:00:00] Input SAM file : /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam
[00:00:00] Input background : None
[00:00:00] Save background : None
[00:00:00] Scoring method : standalone
[00:00:00] Scoring parameter : 0.5:0.99
[00:00:00] Database : ['database/NCBI_genomes_refseq89_BAV.fa.mmi']
[00:00:00] Abundance : DEPTH_COV
[00:00:00] Output path : /srv
[00:00:00] Prefix : strawman_pathogen-miseq_95gg9031_05vv10245
[00:00:00] Mode : report
[00:00:00] Specific taxid : None
[00:00:00] Threads : 8
[00:00:00] First #refs in XA : 30
[00:00:00] Extra NM in XA : 1
[00:00:00] Minimal score : 0
[00:00:00] Minimal RSNB : 1
[00:00:00] Minimal reads : 3
[00:00:00] Minimal linear len: 50
[00:00:00] Minimal genome cov: 0.004
[00:00:00] Minimal depth (DC): 0.01
[00:00:00] Minimal RSDCnr : 0.0009
[00:00:00] Aligner option : -x map-ont
[00:00:00] Aligner seed len : 40
[00:00:00] Aligner min score : 60
[00:00:00] Aligner path : /opt/conda/envs/pangia/bin/minimap2
[00:00:00] Samtools path : /opt/conda/envs/pangia/bin/samtools
[00:00:00] Loading taxonomy information...
[00:00:08] Done.
[00:00:08] Loading pathogen information...
[00:00:08] Done. 2817 pathogens loaded.
[00:00:08] Loading taxonomic uniqueness information...
[00:00:08] Done. 31177 taxonomic uniqueness loaded.
[00:00:08] Loading sizes of genomes...
[00:00:08] Done. 9634 target and 0 host genome(s) loaded.
[00:00:08] Running read-mapping...
[00:00:08] Mapping to database/NCBI_genomes_refseq89_BAV.fa.mmi...
[WARNING]�[1;31m For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix.�[0m
[M::main::12.096*1.00] loaded/built the index for 2010 target sequence(s)
[M::mm_mapopt_update::15.182*1.00] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2010
[M::mm_idx_stat::17.103*1.00] distinct minimizers: 154437548 (33.40% are singletons); average occurrences: 4.873; average spacing: 5.353
[M::worker_pipeline::31.420*2.60] mapped 799768 sequences
[M::main::42.605*2.18] loaded/built the index for 11332 target sequence(s)
[M::mm_mapopt_update::42.605*2.18] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 11332
[M::mm_idx_stat::45.388*2.11] distinct minimizers: 139932295 (37.69% are singletons); average occurrences: 3.883; average spacing: 5.353
[M::worker_pipeline::58.889*2.95] mapped 799768 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -aL -t 8 -x map-ont database/NCBI_genomes_refseq89_BAV.fa.mmi /srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq
[M::main] Real time: 59.028 sec; CPU: 173.916 sec; Peak RSS: 11.823 GB
[00:01:08] Done mapping reads to the database(s).
[00:01:08] Merging SAM files...
[00:01:09] Logfile saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.log.
[00:01:09] Done. Mapped SAM file saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam.
[00:01:09] Total number of input reads: 1713173
[00:01:09] Total number of mapped reads: 41953
[00:01:09] Total number of host reads: 0 (0.00%)
[00:01:09] Total number of ignored reads (cross superkingdom): 29 (0.07%)
[00:01:09] Processing SAM file...
[00:01:09] Parsing SAM files with 8 subprocesses...
[00:00:00] Starting PanGIA 1.0.0-RC6.1
[00:00:00] Temporary directory '/srv/strawman_pathogen-miseq_95gg9031_05vv10245_tmp' found. Deleting directory...
[00:00:00] Arguments and dependencies checked:
[00:00:00] Input reads : ['/srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq']
[00:00:00] Input SAM file : /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam
[00:00:00] Input background : None
[00:00:00] Save background : None
[00:00:00] Scoring method : standalone
[00:00:00] Scoring parameter : 0.5:0.99
[00:00:00] Database : ['database/NCBI_genomes_refseq89_BAV.fa.mmi']
[00:00:00] Abundance : DEPTH_COV
[00:00:00] Output path : /srv
[00:00:00] Prefix : strawman_pathogen-miseq_95gg9031_05vv10245
[00:00:00] Mode : report
[00:00:00] Specific taxid : None
[00:00:00] Threads : 8
[00:00:00] First #refs in XA : 30
[00:00:00] Extra NM in XA : 1
[00:00:00] Minimal score : 0
[00:00:00] Minimal RSNB : 1
[00:00:00] Minimal reads : 3
[00:00:00] Minimal linear len: 50
[00:00:00] Minimal genome cov: 0.004
[00:00:00] Minimal depth (DC): 0.01
[00:00:00] Minimal RSDCnr : 0.0009
[00:00:00] Aligner option : -x map-ont
[00:00:00] Aligner seed len : 40
[00:00:00] Aligner min score : 60
[00:00:00] Aligner path : /opt/conda/envs/pangia/bin/minimap2
[00:00:00] Samtools path : /opt/conda/envs/pangia/bin/samtools
[00:00:00] Loading taxonomy information...
[00:00:08] Done.
[00:00:08] Loading pathogen information...
[00:00:08] Done. 2817 pathogens loaded.
[00:00:08] Loading taxonomic uniqueness information...
[00:00:08] Done. 31177 taxonomic uniqueness loaded.
[00:00:08] Loading sizes of genomes...
[00:00:08] Done. 9634 target and 0 host genome(s) loaded.
[00:00:08] Running read-mapping...
[00:00:08] Mapping to database/NCBI_genomes_refseq89_BAV.fa.mmi...
[WARNING]�[1;31m For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix.�[0m
[M::main::12.154*1.00] loaded/built the index for 2010 target sequence(s)
[M::mm_mapopt_update::15.333*1.00] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2010
[M::mm_idx_stat::17.257*1.00] distinct minimizers: 154437548 (33.40% are singletons); average occurrences: 4.873; average spacing: 5.353
[M::worker_pipeline::29.478*2.96] mapped 799768 sequences
[M::main::40.952*2.41] loaded/built the index for 11332 target sequence(s)
[M::mm_mapopt_update::40.952*2.41] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 11332
[M::mm_idx_stat::42.893*2.35] distinct minimizers: 139932295 (37.69% are singletons); average occurrences: 3.883; average spacing: 5.353
[M::worker_pipeline::57.041*3.15] mapped 799768 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -aL -t 8 -x map-ont database/NCBI_genomes_refseq89_BAV.fa.mmi /srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq
[M::main] Real time: 57.196 sec; CPU: 179.640 sec; Peak RSS: 11.823 GB
[00:01:06] Done mapping reads to the database(s).
[00:01:06] Merging SAM files...
[00:01:08] Logfile saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.log.
[00:01:08] Done. Mapped SAM file saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam.
[00:01:08] Total number of input reads: 1713173
[00:01:08] Total number of mapped reads: 41953
[00:01:08] Total number of host reads: 0 (0.00%)
[00:01:08] Total number of ignored reads (cross superkingdom): 29 (0.07%)
[00:01:08] Processing SAM file...
[00:01:08] Parsing SAM files with 8 subprocesses...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/envs/pangia/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/pangia/pangia/pangia.py", line 714, in worker
lcr_lvl, lcr_name, lcr_info = lineageLCR(taxids)
File "/home/pangia/pangia/pangia.py", line 378, in lineageLCR
lng = t.taxid2lineageDICT(tid, 1, 1)
File "/home/pangia/pangia/taxonomy.py", line 265, in taxid2lineageDICT
return _taxid2lineage( tid, print_all_rank, print_strain, replace_space2underscore, output_typ e )
File "/home/pangia/pangia/taxonomy.py", line 305, in _taxid2lineage
rank = _getTaxRank(taxID)
File "/home/pangia/pangia/taxonomy.py", line 372, in _getTaxRank
return taxRanks[taxID]
KeyError: '134962'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/pangia/pangia/pangia.py", line 2319, in <module>
(res, mapped_r_cnt) = processSAMfile( os.path.abspath(samfile), argvs.threads, lines_per_proce
ss)
File "/home/pangia/pangia/pangia.py", line 921, in processSAMfile
results.append( job.get() )
File "/opt/conda/envs/pangia/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
KeyError: '134962'
zeliy90 commented
Im not sure what the solution might be but I was hoping you coulds answer a question I have Im using pair-end files do I need to specify that for pangia?