[BUG] <title>
Artifice120 opened this issue · 3 comments
Describe the bug
While trying to run PGAP on a new Rickettsia genome I am forced to include a species. Since there are no assemblies for this species of rickettsia I used two recommended tax-check flags as seen bellow
PGAP_INPUT_DIR=/lustre/isaac/scratch/jtorre28/pgap
./pgap.py -r -o rick_results -g /lustre/isaac/scratch/jtorre28/spades/test_out/contamination_screening/rick3.contigs.filt.fa -s 'Rickettsia japonica' --taxcheck --auto-correct-tax --debug
However, I get the following error
Filesystem 1K-blocks Used Available Use% Mounted on
172.31.0.24@o2ib:172.31.0.26@o2ib:/isaaclfs 3954753932040 2934672204344 820492104044 79% /lustre/isaac
Output will be placed in: /lustre/isaac/scratch/jtorre28/pgap/rick_results
PGAP version 2024-07-18.build7555 is up to date.
installation directory: /lustre/isaac/scratch/jtorre28/pgap
Skipping already installed tarball: https://s3.amazonaws.com/pgap/input-2024-07-18.build7555.tgz
Singularity sif files exists, not updating.
Downloading and extracting tarball: https://s3.amazonaws.com/pgap/input-2024-07-18.build7555.ani.tgz
WARNING: open files is less than the recommended value of 8000
TAXCHECK completed successfully.
DEBUG: args.output = rick_results
DEBUG: params.outputdir = /lustre/isaac/scratch/jtorre28/pgap/rick_results
ERROR: taxcheck failed to assign a species with high confidence, thus PGAP will not execute. See /lustre/isaac/scratch/jtorre28/pgap/rick_results/ani-tax-report.txt
This is the tax-file
ANI report for assembly: rick3.contigs.filt.fa
Submitted organism: Rickettsia japonica (taxid = 35790, rank = species, lineage = Bacteria; Pseudomonadota; Alphaproteobacteria; Rickettsiales; Rickettsiaceae; Rickettsieae; Rickettsia; spotted fever group)
Best match: Rickettsia bellii (taxid = 33990, rank = species, lineage = Bacteria; Pseudomonadota; Alphaproteobacteria; Rickettsiales; Rickettsiaceae; Rickettsieae; Rickettsia; belli group)
Submitted organism has type: Yes
Status: INCONCLUSIVE
Confidence: LOW
Table legend:
ANI : ANI value between this assembly and the type listed in this row
(Coverages) : query-coverage and subject-coverage of this assembly (query) and the type (subject)
NewSeq : the count of bases best assigned to this type assembly
CntmSeq : the portion of NewSeq allocated for purposes of evaluating contamination
Flg : Type flags; currently: C = contaminant; E = effectively published; T = trusted species
Assembly : Release-id of the type-assembly (this value matches the accession and assembly-name on the right column)
Organism : Organism of this type-assembly
(assembly_accession, assembly_name) : of this type-assembly
ANI (Coverages) NewSeq CntmSeq Assembly Flg Organism (assembly_accession, assembly_name)
------- ------------- -------- -------- --------- --- --------------------------------------------------------------------
93.315 ( 64.4 61.7) 890567 890567 13188 Rickettsia bellii RML369-C (GCA_000012385.1, ASM1238v1)
83.956 ( 16.9 17.9) 6461 3501 3134898 Rickettsia asembonensis (GCA_000828125.2, ASM82812v2)
83.918 ( 16.4 14.4) 5415 661 21983708 Rickettsia tamurae subsp. buchneri (GCA_000696365.2, REISMNv1)
83.933 ( 16.2 16.3) 4648 4648 1199088 Rickettsia tamurae (GCA_000751075.1, Rickettsia tamurae AT-1)
83.850 ( 17.0 16.7) 15624 671 1655938 Rickettsia conorii subsp. raoultii (GCA_000940955.1, ASM94095v1)
83.776 ( 16.2 16.3) 3299 3299 6004488 Rickettsia fournieri (GCA_900243065.1, PRJEB23962)
83.701 ( 17.8 17.5) 3156 3156 1485538 Rickettsia hoogstraalii (GCA_000825685.1, Rickettsia hoogstraalii Croatica)
83.714 ( 17.0 17.2) 1930 1930 37927588 Rickettsia tillamookensis (GCA_016743795.2, ASM1674379v2)
83.813 ( 15.6 16.8) 492 492 1720158 Rickettsia monacensis (GCA_000499665.2, RMONA_1)
83.584 ( 15.7 17.8) 43930 43930 406738 Rickettsia japonica YH (GCA_000283595.1, ASM28359v1)
83.520 ( 14.5 16.7) 118 118 1526588 Rickettsia rickettsii str. Iowa (GCA_000017445.3, ASM1744v3)
83.635 ( 15.6 16.8) 108 108 834068 Rickettsia gravesii BWI-1 (GCA_000485845.1, RicGra1.0)
83.561 ( 15.4 17.6) 0 0 296048 Rickettsia conorii subsp. heilongjiangensis 054 (GCA_000221205.1, ASM22120v1)
83.566 ( 15.4 17.7) 0 0 380228 Rickettsia honei RB (GCA_000263055.1, Rho1.0)
83.602 ( 14.5 16.8) 0 0 432068 Rickettsia sibirica subsp. mongolitimonae HA-91 (GCA_000247625.2, ASM24762v2)
83.595 ( 15.7 17.9) 0 0 864348 Rickettsia japonica YH (GCA_000302635.2, ASM30263v2)
83.520 ( 14.5 16.7) 0 0 3973358 Rickettsia rickettsii (GCA_001950995.1, ASM195099v1)
83.520 ( 14.5 16.7) 0 0 3973378 Rickettsia rickettsii (GCA_001951015.1, ASM195101v1)
83.467 ( 14.9 16.4) 1776 324 392728 Rickettsia australis str. Phillips (GCA_000273745.1, Rau1.0)
83.442 ( 15.6 17.9) 631 631 7828 Rickettsia conorii str. Malish 7 (GCA_000007025.1, ASM702v1)
83.497 ( 15.9 18.2) 0 0 320558 Rickettsia slovaca 13-B (GCA_000237845.1, ASM23784v1)
83.423 ( 15.0 17.4) 0 0 377238 Rickettsia conorii subsp. caspia A-167 (GCA_000261325.1, RcoCa1.0)
83.406 ( 15.1 17.6) 0 0 381518 Rickettsia conorii subsp. israelensis ISTT CDC1 (GCA_000263815.1, RcoIs1.0)
83.366 ( 15.4 17.2) 0 0 407678 Rickettsia rhipicephali str. 3-7-female6-CWPP (GCA_000284075.1, ASM28407v1)
83.282 ( 15.6 18.2) 97 97 202718 Rickettsia sibirica 246 (GCA_000166935.1, ASM16693v1)
82.808 ( 10.7 14.0) 766 766 591868 Rickettsia prowazekii str. Breinl (GCA_000367405.1, ASM36740v1)
83.407 ( 8.9 11.7) 300 300 8848 Rickettsia typhi str. Wilmington (GCA_000008045.1, ASM804v1)
Is there a way run pgap under the closest species "Rickettsia bellii" Then have it just keep all the "contaminating" sequences?
At the very least a gene prediction file on its own would be just as good. Then I could do the homology searches myself.
Additional context
This is a new Ricketssia species so none of the NCBI references will match well.
Thank you for your report, user @Artifice120 !
Is there a way run pgap under the closest species "Rickettsia bellii" Then have it just keep all the "contaminating" sequences?
Sounds like a reasonable plan. If you add --ignore-all-errors
to your command line you might be able to get through the end.
Thanks,
Seems to have finished with all outputs and a CheckM completeness of 97%. There are excessive gene predictions but that is expected.
You are welcome, user @Artifice120 !