LANL-Bioinformatics/PanGIA

No taxid in taxRanks

Opened this issue · 1 comments

I've run into this issue using PanGIA command line. Logs are attached:

[00:00:00] Starting PanGIA 1.0.0-RC6.1
[00:00:00] Arguments and dependencies checked:
[00:00:00]     Input reads       : ['/srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq']
[00:00:00]     Input SAM file    : /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam
[00:00:00]     Input background  : None
[00:00:00]     Save background   : None
[00:00:00]     Scoring method    : standalone
[00:00:00]     Scoring parameter : 0.5:0.99
[00:00:00]     Database          : ['database/NCBI_genomes_refseq89_BAV.fa.mmi']
[00:00:00]     Abundance         : DEPTH_COV
[00:00:00]     Output path       : /srv
[00:00:00]     Prefix            : strawman_pathogen-miseq_95gg9031_05vv10245
[00:00:00]     Mode              : report
[00:00:00]     Specific taxid    : None
[00:00:00]     Threads           : 8
[00:00:00]     First #refs in XA : 30
[00:00:00]     Extra NM in XA    : 1
[00:00:00]     Minimal score     : 0
[00:00:00]     Minimal RSNB      : 1
[00:00:00]     Minimal reads     : 3
[00:00:00]     Minimal linear len: 50
[00:00:00]     Minimal genome cov: 0.004
[00:00:00]     Minimal depth (DC): 0.01
[00:00:00]     Minimal RSDCnr    : 0.0009
[00:00:00]     Aligner option    : -x map-ont
[00:00:00]     Aligner seed len  : 40
[00:00:00]     Aligner min score : 60
[00:00:00]     Aligner path      : /opt/conda/envs/pangia/bin/minimap2
[00:00:00]     Samtools path     : /opt/conda/envs/pangia/bin/samtools
[00:00:00] Loading taxonomy information...
[00:00:08] Done.
[00:00:08] Loading pathogen information...
[00:00:08] Done. 2817 pathogens loaded.
[00:00:08] Loading taxonomic uniqueness information...
[00:00:08] Done. 31177 taxonomic uniqueness loaded.
[00:00:08] Loading sizes of genomes...
[00:00:08] Done. 9634 target and 0 host genome(s) loaded.
[00:00:08] Running read-mapping...
[00:00:08] Mapping to database/NCBI_genomes_refseq89_BAV.fa.mmi...
[WARNING]�[1;31m For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix.�[0m
[M::main::12.096*1.00] loaded/built the index for 2010 target sequence(s)
[M::mm_mapopt_update::15.182*1.00] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2010
[M::mm_idx_stat::17.103*1.00] distinct minimizers: 154437548 (33.40% are singletons); average occurrences: 4.873; average spacing: 5.353
[M::worker_pipeline::31.420*2.60] mapped 799768 sequences
[M::main::42.605*2.18] loaded/built the index for 11332 target sequence(s)
[M::mm_mapopt_update::42.605*2.18] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 11332
[M::mm_idx_stat::45.388*2.11] distinct minimizers: 139932295 (37.69% are singletons); average occurrences: 3.883; average spacing: 5.353
[M::worker_pipeline::58.889*2.95] mapped 799768 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -aL -t 8 -x map-ont database/NCBI_genomes_refseq89_BAV.fa.mmi /srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq
[M::main] Real time: 59.028 sec; CPU: 173.916 sec; Peak RSS: 11.823 GB
[00:01:08] Done mapping reads to the database(s).
[00:01:08] Merging SAM files...
[00:01:09] Logfile saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.log.
[00:01:09] Done. Mapped SAM file saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam.
[00:01:09] Total number of input reads: 1713173
[00:01:09] Total number of mapped reads: 41953
[00:01:09] Total number of host reads: 0 (0.00%)
[00:01:09] Total number of ignored reads (cross superkingdom): 29 (0.07%)
[00:01:09] Processing SAM file... 
[00:01:09] Parsing SAM files with 8 subprocesses...
[00:00:00] Starting PanGIA 1.0.0-RC6.1
[00:00:00] Temporary directory '/srv/strawman_pathogen-miseq_95gg9031_05vv10245_tmp' found. Deleting directory...
[00:00:00] Arguments and dependencies checked:
[00:00:00]     Input reads       : ['/srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq']
[00:00:00]     Input SAM file    : /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam
[00:00:00]     Input background  : None
[00:00:00]     Save background   : None
[00:00:00]     Scoring method    : standalone
[00:00:00]     Scoring parameter : 0.5:0.99
[00:00:00]     Database          : ['database/NCBI_genomes_refseq89_BAV.fa.mmi']
[00:00:00]     Abundance         : DEPTH_COV
[00:00:00]     Output path       : /srv
[00:00:00]     Prefix            : strawman_pathogen-miseq_95gg9031_05vv10245
[00:00:00]     Mode              : report
[00:00:00]     Specific taxid    : None
[00:00:00]     Threads           : 8
[00:00:00]     First #refs in XA : 30
[00:00:00]     Extra NM in XA    : 1
[00:00:00]     Minimal score     : 0
[00:00:00]     Minimal RSNB      : 1
[00:00:00]     Minimal reads     : 3
[00:00:00]     Minimal linear len: 50
[00:00:00]     Minimal genome cov: 0.004
[00:00:00]     Minimal depth (DC): 0.01
[00:00:00]     Minimal RSDCnr    : 0.0009
[00:00:00]     Aligner option    : -x map-ont
[00:00:00]     Aligner seed len  : 40
[00:00:00]     Aligner min score : 60
[00:00:00]     Aligner path      : /opt/conda/envs/pangia/bin/minimap2
[00:00:00]     Samtools path     : /opt/conda/envs/pangia/bin/samtools
[00:00:00] Loading taxonomy information...
[00:00:08] Done.
[00:00:08] Loading pathogen information...
[00:00:08] Done. 2817 pathogens loaded.
[00:00:08] Loading taxonomic uniqueness information...
[00:00:08] Done. 31177 taxonomic uniqueness loaded.
[00:00:08] Loading sizes of genomes...
[00:00:08] Done. 9634 target and 0 host genome(s) loaded.
[00:00:08] Running read-mapping...
[00:00:08] Mapping to database/NCBI_genomes_refseq89_BAV.fa.mmi...
[WARNING]�[1;31m For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix.�[0m
[M::main::12.154*1.00] loaded/built the index for 2010 target sequence(s)
[M::mm_mapopt_update::15.333*1.00] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2010
[M::mm_idx_stat::17.257*1.00] distinct minimizers: 154437548 (33.40% are singletons); average occurrences: 4.873; average spacing: 5.353
[M::worker_pipeline::29.478*2.96] mapped 799768 sequences
[M::main::40.952*2.41] loaded/built the index for 11332 target sequence(s)
[M::mm_mapopt_update::40.952*2.41] mid_occ = 236
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 11332
[M::mm_idx_stat::42.893*2.35] distinct minimizers: 139932295 (37.69% are singletons); average occurrences: 3.883; average spacing: 5.353
[M::worker_pipeline::57.041*3.15] mapped 799768 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -aL -t 8 -x map-ont database/NCBI_genomes_refseq89_BAV.fa.mmi /srv/test_fastq/strawman_pathogen-miseq_95gg9031_05vv10245.fastq
[M::main] Real time: 57.196 sec; CPU: 179.640 sec; Peak RSS: 11.823 GB
[00:01:06] Done mapping reads to the database(s).
[00:01:06] Merging SAM files...
[00:01:08] Logfile saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.log.
[00:01:08] Done. Mapped SAM file saved to /srv/strawman_pathogen-miseq_95gg9031_05vv10245.pangia.sam.
[00:01:08] Total number of input reads: 1713173
[00:01:08] Total number of mapped reads: 41953
[00:01:08] Total number of host reads: 0 (0.00%)
[00:01:08] Total number of ignored reads (cross superkingdom): 29 (0.07%)
[00:01:08] Processing SAM file... 
[00:01:08] Parsing SAM files with 8 subprocesses...

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/envs/pangia/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/pangia/pangia/pangia.py", line 714, in worker
    lcr_lvl, lcr_name, lcr_info = lineageLCR(taxids)
  File "/home/pangia/pangia/pangia.py", line 378, in lineageLCR
    lng = t.taxid2lineageDICT(tid, 1, 1)
  File "/home/pangia/pangia/taxonomy.py", line 265, in taxid2lineageDICT
    return _taxid2lineage( tid, print_all_rank, print_strain, replace_space2underscore, output_typ e )
  File "/home/pangia/pangia/taxonomy.py", line 305, in _taxid2lineage
    rank = _getTaxRank(taxID)
  File "/home/pangia/pangia/taxonomy.py", line 372, in _getTaxRank
    return taxRanks[taxID]
KeyError: '134962'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pangia/pangia/pangia.py", line 2319, in <module>
    (res, mapped_r_cnt) = processSAMfile( os.path.abspath(samfile), argvs.threads, lines_per_proce
ss)
  File "/home/pangia/pangia/pangia.py", line 921, in processSAMfile
    results.append( job.get() )
  File "/opt/conda/envs/pangia/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
KeyError: '134962'

Im not sure what the solution might be but I was hoping you coulds answer a question I have Im using pair-end files do I need to specify that for pangia?