gbouras13/hybracter

[Feature request] Automatic Dnaapler fall-back or adjustable Dnaapler --evalue param

Closed this issue · 10 comments

Thank you so much for this amazing tool!

Is your feature request related to a problem? Please describe.
Due to the nature of Snakemake, if an error occurs during any of the steps, the whole pipeline fails and the output is removed.
I find this often happens because of the Dnaapler step not being able to find an appropriate match. Unfortunately, Dnaapler's default search for DnaA is not usable for me, since (to my knowledge) almost no archaea possess DnaA. So I always have to use a custom Dnaapler database, where I chose RpoA, because this is ubiquitous in all domains of life. However, this sometimes results in not-that-specific hits, which fail the Dnaapler BLASTx cutoffs and therefore force the whole Hybracter pipeline to fail, even though the assembly is good enough for downstream usage (just not orientated).

Describe the solution you'd like
I could see two possible solutions:

  1. make the Dnaapler --evalue param available as a Hybracter cli param: hybracter long [...] --dnaapler-evalue 1e-9.
  2. Using the Snakemake try and attempt options, one could implement a small helper python function that basically does "if first attempt use Dnaapler custom, ifelse second attempt use Dnaapler largest, else fail":
def dnaaplerFallback(wildcards,attempt):
    if config.args.dnaapler_custom_db == "none" and attempt == 1:
        return "all"
    elif config.args.dnaapler_custom_db != "none" and attempt == 1:
        return "custom"
    elif config.args.dnaapler_custom_db != "none" and attempt == 2:
        return "largest"
    else:
        raise ValueError("Invalid attempt number or dnaapler_custom_db value")

Thank you for your consideration

  • Richard

Hi @richardstoeckl ,

Thanks for your issue. Very interesting, I haven't run Hybracter on archaeal genomes so I didn't know this!

I think I will do 2 things.

The first is to implement your first suggestion. The reason for this is because I don't want to make hybracter more fragile and more complicated with fallbacks (I foresee the main problem/source of issues going forward is fragility!), but adding --dnaapler_evalue would be trivial, which a fall back mechanism may do more harm than good.

The other thing is that clearly you (and surely others) would find it useful to have a database for Archaea. So I'll add that in at some point soon and release an update :)

George

Hi @richardstoeckl ,

Have run hybracter since the Dnaapler upgrade to v0.8.0 on your archaea? I would be interested to see how it goes - and whether you still think it is worth that I add --dnaapler_evalue

George

Hi @gbouras13 ,

sorry, I didn't catch a hybracter release with the new version of Dnaapler. Which version can I use to test?

Thanks,
Richard

@richardstoeckl

The existing version of hybracter v0.7.3 will install the newest dnaapler version (in the way snakemake works). So if either you reinstall hybracter v0.7.3, or reinstall ust the dnaapler env or even just install dnaapler v0.8.0 and run that manually, that should work.

George

@gbouras13

Everything seems fine with 90% of my samples, but two of them return the Error "return code -11":

Error in rule dnaapler_no_medaka:
    jobid: 28
    input: results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta, results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_ignore_list.txt
    output: results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304/BBR080903_cB0304_reoriented.fasta, results/BBR080903_cB0304/hybracter_long/versions/BBR080903_cB0304/dnaapler.version
    log: results/BBR080903_cB0304/hybracter_long/stderr/dnaapler/BBR080903_cB0304.log (check log file(s) for error details)
    conda-env: /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_
    shell:
        
        dnaapler all -i results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta -o results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304 --ignore results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_ignore_list.txt -p BBR080903_cB0304 -t 8 -a nearest --db dnaa,repa -f 2> results/BBR080903_cB0304/hybracter_long/stderr/dnaapler/BBR080903_cB0304.log
        dnaapler --version > results/BBR080903_cB0304/hybracter_long/versions/BBR080903_cB0304/dnaapler.version
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Logfile results/BBR080903_cB0304/hybracter_long/stderr/dnaapler/BBR080903_cB0304.log:
================================================================================
2024-08-20 13:21:10.900 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified dnaA,repA gene(s) to reorient your sequence
2024-08-20 13:21:10.908 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-08-20 13:21:11.003 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-08-20 13:21:11.004 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-08-20 13:21:11.004 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-08-20 13:21:11.004 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-08-20 13:21:11.005 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-08-20 13:21:11.005 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta
2024-08-20 13:21:11.005 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304
2024-08-20 13:21:11.006 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-08-20 13:21:11.006 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-08-20 13:21:11.006 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix BBR080903_cB0304
2024-08-20 13:21:11.006 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-08-20 13:21:11.007 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-08-20 13:21:11.007 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-08-20 13:21:11.007 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --ignore results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_ignore_list.txt
2024-08-20 13:21:11.007 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --custom_db 
2024-08-20 13:21:11.008 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db dnaa,repa
2024-08-20 13:21:11.008 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta is in FASTA format and has at least 1 entry.
2024-08-20 13:21:11.033 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta file checked.
2024-08-20 13:21:11.041 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta has only one entry.
2024-08-20 13:21:11.046 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-08-20 13:21:11.047 | INFO     | dnaapler:all:930 - You have specified contigs to ignore in results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_ignore_list.txt.
2024-08-20 13:21:11.047 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304/BBR080903_cB0304_blast_output.txt -query results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta ...
2024-08-20 13:21:27.365 | ERROR    | dnaapler.utils.external_tools:run_tool:91 - Error calling blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/BBR080903_cB0304/hybracter_long/processing/complete/dnaapler/BBR080903_cB0304/BBR080903_cB0304_blast_output.txt -query results/BBR080903_cB0304/hybracter_long/processing/pre_polish/BBR080903_cB0304_chromosome.fasta (return code -11)
================================================================================

and

[Mon Aug 19 19:26:17 2024]
Error in rule dnaapler_no_medaka:
    jobid: 28
    input: results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta, results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt
    output: results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304/Shy_cB0304_reoriented.fasta, results/Shy_cB0304/hybracter_long/versions/Shy_cB0304/dnaapler.version
    log: results/Shy_cB0304/hybracter_long/stderr/dnaapler/Shy_cB0304.log (check log file(s) for error details)
    conda-env: /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_
    shell:
        
        dnaapler all -i results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta -o results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304 --ignore results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt -p Shy_cB0304 -t 8 -a nearest --db dnaa,repa -f 2> results/Shy_cB0304/hybracter_long/stderr/dnaapler/Shy_cB0304.log
        dnaapler --version > results/Shy_cB0304/hybracter_long/versions/Shy_cB0304/dnaapler.version
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile results/Shy_cB0304/hybracter_long/stderr/dnaapler/Shy_cB0304.log:
================================================================================
2024-08-19 19:26:04.054 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304
2024-08-19 19:26:04.061 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified dnaA,repA gene(s) to reorient your sequence
2024-08-19 19:26:04.062 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-08-19 19:26:04.147 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix Shy_cB0304
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --ignore results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --custom_db 
2024-08-19 19:26:04.150 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db dnaa,repa
2024-08-19 19:26:04.150 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta is in FASTA format and has at least 1 entry.
2024-08-19 19:26:04.162 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta file checked.
2024-08-19 19:26:04.168 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta has only one entry.
2024-08-19 19:26:04.173 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-08-19 19:26:04.173 | INFO     | dnaapler:all:930 - You have specified contigs to ignore in results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt.
2024-08-19 19:26:04.174 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304/Shy_cB0304_blast_output.txt -query results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta ...
2024-08-19 19:26:16.998 | ERROR    | dnaapler.utils.external_tools:run_tool:91 - Error calling blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304/Shy_cB0304_blast_output.txt -query results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta (return code -11)
================================================================================

[Mon Aug 19 19:26:17 2024]
Error in rule dnaapler_pre_chrom:
    jobid: 33
    input: results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta, results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt
    output: results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom/Shy_cB0304_reoriented.fasta
    log: results/Shy_cB0304/hybracter_long/stderr/dnaapler/Shy_cB0304_pre_chrom.log (check log file(s) for error details)
    conda-env: /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_
    shell:
        
        dnaapler all -i results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta -o results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom --ignore results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt -p Shy_cB0304 -t 8 -a nearest --db dnaa,repa -f 2> results/Shy_cB0304/hybracter_long/stderr/dnaapler/Shy_cB0304_pre_chrom.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile results/Shy_cB0304/hybracter_long/stderr/dnaapler/Shy_cB0304_pre_chrom.log:
================================================================================
2024-08-19 19:26:04.063 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified dnaA,repA gene(s) to reorient your sequence
2024-08-19 19:26:04.071 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-08-19 19:26:04.148 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-08-19 19:26:04.149 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta
2024-08-19 19:26:04.150 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom
2024-08-19 19:26:04.150 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-08-19 19:26:04.150 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix Shy_cB0304
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --ignore results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --custom_db 
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db dnaa,repa
2024-08-19 19:26:04.151 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta is in FASTA format and has at least 1 entry.
2024-08-19 19:26:04.163 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta file checked.
2024-08-19 19:26:04.169 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta has only one entry.
2024-08-19 19:26:04.174 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-08-19 19:26:04.174 | INFO     | dnaapler:all:930 - You have specified contigs to ignore in results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_ignore_list.txt.
2024-08-19 19:26:04.175 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom/Shy_cB0304_blast_output.txt -query results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta ...
2024-08-19 19:26:17.117 | ERROR    | dnaapler.utils.external_tools:run_tool:91 - Error calling blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/Shy_cB0304/hybracter_long/processing/complete/dnaapler/Shy_cB0304_pre_chrom/Shy_cB0304_blast_output.txt -query results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta (return code -11)
================================================================================

I will look into these samples in more detail and report back to you. Could you tell me if dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db dnaa,repa is the intended option (so no mention of the archaeal DB), and what the blastx (return code -11) could refer to?

Hi @richardstoeckl ,

You are right - the dnaa,repa needs to be changed to include the COG1474 gene. My bad.

The return code is very strange, not sure about that (my guess it is related to finding nothing).

Do you mind trying dnaapler all (by itself) on these contigs? Specifically the contig results/Shy_cB0304/hybracter_long/processing/pre_polish/Shy_cB0304_chromosome.fasta. It now includes archaea support.

I might even introduce an --archaea flag into hybracter that only tries COG1474.

The reason is is that if all your samples bar 2 worked, that means a dnaA homolog was found in those. Which is quite high.

Therefore if we want the standard going forward to be the COG1474 for archaea, then we should probably make that the only option (i.e. to remove dnaa).

George

Hi @gbouras13 ,

please excuse the long time it took me to get back to this issue.

I think we now figured out and fixed the first problem:

Using the archae mode on the "Shy" sample worked:

dnaapler archaea -i Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta -o Shy_cB0304_manual/manualDnaapler/ -p Shy_cB0304_manualDnaapler -t 8 -a nearest -f 2> Shy_cB0304_manual/manualDnaapler.log

With the following log:

2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is Shy_cB0304_manual/manualDnaapler/
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified cog1474 gene(s) to reorient your sequence
2024-09-10 15:13:50.628 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta
2024-09-10 15:13:50.707 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output Shy_cB0304_manual/manualDnaapler/
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix Shy_cB0304_manualDnaapler
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-09-10 15:13:50.708 | INFO     | dnaapler:archaea:259 - You have chosen nearest method to reorient your sequence if the BLAST based method fails.
2024-09-10 15:13:50.708 | INFO     | dnaapler.utils.validation:validate_fasta:46 - Checking that the input file Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta is in FASTA format and has only 1 entry.
2024-09-10 15:13:50.718 | INFO     | dnaapler.utils.validation:validate_fasta:53 - Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta file checked.
2024-09-10 15:13:50.723 | INFO     | dnaapler.utils.validation:validate_fasta:62 - Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta has only one entry.
2024-09-10 15:13:50.724 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-09-10 15:13:50.724 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/cog1474_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out Shy_cB0304_manual/manualDnaapler/Shy_cB0304_manualDnaapler_blast_output.txt -query Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta ...
2024-09-10 15:13:57.956 | INFO     | dnaapler.utils.external_tools:run:55 - Done running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/cog1474_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out Shy_cB0304_manual/manualDnaapler/Shy_cB0304_manualDnaapler_blast_output.txt -query Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta
2024-09-10 15:13:57.963 | INFO     | dnaapler.utils.processing:reorient_sequence:145 - cog1474 gene identified. It starts at coordinate 574403 on the forward strand in your input file.
2024-09-10 15:13:57.964 | INFO     | dnaapler.utils.processing:reorient_sequence:148 - The best hit with a valid start codon in the database is cog1474_gi|347522555|ref|YP_004780125.1|, which has length of 403 AAs.
2024-09-10 15:13:57.964 | INFO     | dnaapler.utils.processing:reorient_sequence:151 - 396 AAs were covered by the best hit, with an overall coverage of 98.26%.
2024-09-10 15:13:57.964 | INFO     | dnaapler.utils.processing:reorient_sequence:154 - 261 AAs were identical, with an overall identity of 65.91%.
2024-09-10 15:13:57.964 | INFO     | dnaapler.utils.processing:reorient_sequence:157 - Re-orienting.
2024-09-10 15:13:57.974 | INFO     | dnaapler.utils.util:end_dnaapler:160 - dnaapler has finished
2024-09-10 15:13:57.974 | INFO     | dnaapler.utils.util:end_dnaapler:161 - Elapsed time: 7.35 seconds

Using the all mode on this sample also correctly identified the correct gene and everything worked.

dnaapler all -i Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta -o Shy_cB0304_manual/manualDnaapler/ --ignore Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_ignore_list.txt -p Shy_cB0304_manualDnaapler -t 8 -a nearest -f 2> Shy_cB0304_manual/manualDnaapler.log

Which resulted in:

Contig	Gene_Reoriented	Start	Strand	Top_Hit	Top_Hit_Length	Covered_Length	Coverage	Identical_AAs	Identity_Percentage
contig_1	cog1474	574403	forward	cog1474_gi|347522555|ref|YP_004780125.1|	403	396	98.26	261	65.91

With this logfile:

2024-09-10 15:05:38.421 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory Shy_cB0304_manual/manualDnaapler/
2024-09-10 15:05:38.429 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-09-10 15:05:38.429 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-09-10 15:05:38.429 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-09-10 15:05:38.429 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta
2024-09-10 15:05:38.429 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is Shy_cB0304_manual/manualDnaapler/
2024-09-10 15:05:38.429 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-09-10 15:05:38.430 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified all gene(s) to reorient your sequence
2024-09-10 15:05:38.430 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-09-10 15:05:38.521 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-09-10 15:05:38.521 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-09-10 15:05:38.522 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-09-10 15:05:38.522 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-09-10 15:05:38.522 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-09-10 15:05:38.522 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta
2024-09-10 15:05:38.522 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output Shy_cB0304_manual/manualDnaapler/
2024-09-10 15:05:38.522 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix Shy_cB0304_manualDnaapler
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --ignore Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_ignore_list.txt
2024-09-10 15:05:38.523 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --custom_db 
2024-09-10 15:05:38.524 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db all
2024-09-10 15:05:38.524 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta is in FASTA format and has at least 1 entry.
2024-09-10 15:05:38.542 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta file checked.
2024-09-10 15:05:38.549 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta has only one entry.
2024-09-10 15:05:38.553 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-09-10 15:05:38.554 | INFO     | dnaapler:all:930 - You have specified contigs to ignore in Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_ignore_list.txt.
2024-09-10 15:05:38.554 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out Shy_cB0304_manual/manualDnaapler/Shy_cB0304_manualDnaapler_blast_output.txt -query Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta ...
2024-09-10 15:06:47.829 | INFO     | dnaapler.utils.external_tools:run:55 - Done running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out Shy_cB0304_manual/manualDnaapler/Shy_cB0304_manualDnaapler_blast_output.txt -query Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_chromosome.fasta
2024-09-10 15:06:47.829 | WARNING  | dnaapler:all:973 - Shy_cB0304_manual/processing/pre_polish/Shy_cB0304_manual_ignore_list.txt contains no text. No contigs will be ignored
2024-09-10 15:06:47.851 | INFO     | dnaapler.utils.util:end_dnaapler:160 - dnaapler has finished
2024-09-10 15:06:47.851 | INFO     | dnaapler.utils.util:end_dnaapler:161 - Elapsed time: 69.43 seconds

However, I think you still need to update the dnaapler command used within hybracter to remove or adjust the --db dnaa,repa parameter.

I will look into the other samples that DIDN'T fail more closely and get back to you!

So just looking into the very first example, I think adjusting the --db parameter would fix the problem.

This rule was run by using hybracter v0.7.3 (from the hybracter.log):

[2024:08:19 16:03:12] Copying system default config to results/BBR060204_cB0304/hybracter_long/config.yaml
[2024:08:19 16:03:12] Updating config file with new values
[2024:08:19 16:03:12] Writing config file to results/BBR060204_cB0304/hybracter_long/config.yaml
[2024:08:19 16:03:12] ------------------
[2024:08:19 16:03:12] | Runtime config |
[2024:08:19 16:03:12] ------------------

args:
  chromosome: 1449461
  configfile: results/BBR060204_cB0304/hybracter_long/config.yaml
  contaminants: none
  databases: /mnt/DATA/common/databases/plassembler/
  depth_filter: 0.25
  dnaapler_custom_db: none
  flyeModel: --nano-hq
  log: results/BBR060204_cB0304/hybracter_long/hybracter.log
  logic: best
  longreads: /mnt/DATA2/rks/BBRseq03AndBBRseq04combined/BBR060204_combinedBBRseq03And04.fastq.gz
  medakaModel: r1041_e82_400bps_sup_v4.2.0
  min_length: 1000
  min_quality: 9
  no_medaka: true
  output: results/BBR060204_cB0304/hybracter_long
  sample: BBR060204_cB0304
  single: true
  skip_qc: false
  subsample_depth: 100
qc:
  compression: 5
  hostRemoveFlagstat: -f 4 -F 3584
  minimapIndex: -I 8G
  minimapModel: map-ont
resources:
  big:
    cpu: 16
    mem: 32000
    time: '23:59:00'
  med:
    cpu: 8
    mem: 16000
    time: 08:00:00
  sml:
    cpu: 1
    mem: 4000
    time: 00:00:05



localrule dnaapler_no_medaka:
    input: results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta, results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt
    output: results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304/BBR060204_cB0304_reoriented.fasta, results/BBR060204_cB0304/hybracter_long/versions/BBR060204_cB0304/dnaapler.version
    log: results/BBR060204_cB0304/hybracter_long/stderr/dnaapler/BBR060204_cB0304.log
    jobid: 28
    benchmark: results/BBR060204_cB0304/hybracter_long/benchmarks/dnaapler/BBR060204_cB0304.txt
    reason: Missing output files: results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304/BBR060204_cB0304_reoriented.fasta; Input files updated by another job: results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt, results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta; Code has changed since last execution; Params have changed since last execution
    wildcards: sample=BBR060204_cB0304
    threads: 8
    resources: tmpdir=/tmp, mem_mb=15259, mem_mib=15259, mem=16000MB, time=08:00:00


        dnaapler all -i results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta -o results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304 --ignore results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt -p BBR060204_cB0304 -t 8 -a nearest --db dnaa,repa -f 2> results/BBR060204_cB0304/hybracter_long/stderr/dnaapler/BBR060204_cB0304.log
        dnaapler --version > results/BBR060204_cB0304/hybracter_long/versions/BBR060204_cB0304/dnaapler.version

(Notice the --db dnaa,repa parameter)

Which resulted in the following log from dnaapler:

2024-08-19 16:07:25.603 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-08-19 16:07:25.603 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-08-19 16:07:25.603 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-08-19 16:07:25.603 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta
2024-08-19 16:07:25.604 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304
2024-08-19 16:07:25.604 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-08-19 16:07:25.604 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified dnaA,repA gene(s) to reorient your sequence
2024-08-19 16:07:25.604 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-08-19 16:07:25.694 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-08-19 16:07:25.694 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-08-19 16:07:25.695 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-08-19 16:07:25.695 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-08-19 16:07:25.695 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-08-19 16:07:25.696 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta
2024-08-19 16:07:25.696 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304
2024-08-19 16:07:25.696 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-08-19 16:07:25.696 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-08-19 16:07:25.697 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix BBR060204_cB0304
2024-08-19 16:07:25.697 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-08-19 16:07:25.697 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-08-19 16:07:25.697 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-08-19 16:07:25.698 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --ignore results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt
2024-08-19 16:07:25.698 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --custom_db 
2024-08-19 16:07:25.698 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db dnaa,repa
2024-08-19 16:07:25.698 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta is in FASTA format and has at least 1 entry.
2024-08-19 16:07:25.724 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta file checked.
2024-08-19 16:07:25.734 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta has only one entry.
2024-08-19 16:07:25.739 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-08-19 16:07:25.739 | INFO     | dnaapler:all:930 - You have specified contigs to ignore in results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt.
2024-08-19 16:07:25.740 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304/BBR060204_cB0304_blast_output.txt -query results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta ...
2024-08-19 16:07:42.844 | INFO     | dnaapler.utils.external_tools:run:55 - Done running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/dnaA_repA_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out results/BBR060204_cB0304/hybracter_long/processing/complete/dnaapler/BBR060204_cB0304/BBR060204_cB0304_blast_output.txt -query results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta
2024-08-19 16:07:42.845 | WARNING  | dnaapler:all:973 - results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt contains no text. No contigs will be ignored
2024-08-19 16:07:42.860 | WARNING  | dnaapler.utils.processing:reorient_single_record_bulk:391 - The top blastx hit for the contig contig_1 did not begin with a valid start codon.
2024-08-19 16:07:42.860 | WARNING  | dnaapler.utils.processing:reorient_single_record_bulk:394 - Searching with pyrodigal for the CDS overlapping the most with the top blastx hit to reorient with.
2024-08-19 16:07:52.889 | INFO     | dnaapler.utils.util:end_dnaapler:160 - dnaapler has finished
2024-08-19 16:07:52.890 | INFO     | dnaapler.utils.util:end_dnaapler:161 - Elapsed time: 27.29 seconds

and the following output:

Contig	Gene_Reoriented	Start	Strand	Top_Hit	Top_Hit_Length	Covered_Length	Coverage	Identical_AAs	Identity_Percentage
contig_1	repA	1504958	forward	UniRef90_A0A0T8UAE4	460	236	51.3	81	34.32

Which incorrectly identifies a repA gene.

When I use the following command instead (which uses the same input as above, but removes the --db parameter):

dnaapler all -i results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta -o BBR060204_cB0304_manual --ignore results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt -p BBR060204_cB0304_manual -t 8 -a nearest -f

I get the following log:

2024-09-10 15:27:27.134 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.8.0
2024-09-10 15:27:27.134 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-09-10 15:27:27.135 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-09-10 15:27:27.135 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta
2024-09-10 15:27:27.135 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is BBR060204_cB0304_manual
2024-09-10 15:27:27.135 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 8 threads to use with blastx
2024-09-10 15:27:27.135 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified all gene(s) to reorient your sequence
2024-09-10 15:27:27.135 | INFO     | dnaapler.utils.util:check_blast_version:117 - Checking BLAST installation.
2024-09-10 15:27:27.214 | INFO     | dnaapler.utils.util:check_blast_version:137 - BLAST version found is v2.16.0.
2024-09-10 15:27:27.214 | INFO     | dnaapler.utils.util:check_blast_version:147 - BLAST version is ok.
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:check_pyrodigal_version:92 - Checking pyrodigal installation.
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:check_pyrodigal_version:103 - Pyrodigal version is v3.5.1
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:check_pyrodigal_version:104 - Pyrodigal version is ok.
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --input results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --output BBR060204_cB0304_manual
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --threads 8
2024-09-10 15:27:27.215 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --force True
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --prefix BBR060204_cB0304_manual
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --evalue 1e-10
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --autocomplete nearest
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --seed_value 13
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --ignore results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --custom_db 
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.util:begin_dnaapler:82 - Parameter: --db all
2024-09-10 15:27:27.216 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta is in FASTA format and has at least 1 entry.
2024-09-10 15:27:27.229 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta file checked.
2024-09-10 15:27:27.236 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta has only one entry.
2024-09-10 15:27:27.241 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-09-10 15:27:27.241 | INFO     | dnaapler:all:930 - You have specified contigs to ignore in results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt.
2024-09-10 15:27:27.242 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out BBR060204_cB0304_manual/BBR060204_cB0304_manual_blast_output.txt -query results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta ...
2024-09-10 15:29:15.087 | INFO     | dnaapler.utils.external_tools:run:55 - Done running blastx -db /home/rks/phd/ouroboros/.snakemake/conda/37009296e852720e3812295fad89acac_/lib/python3.12/site-packages/hybracter/workflow/conda/61381f58265b1d3b2f5a23d599a4e749_/lib/python3.12/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 8 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out BBR060204_cB0304_manual/BBR060204_cB0304_manual_blast_output.txt -query results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_chromosome.fasta
2024-09-10 15:29:15.088 | WARNING  | dnaapler:all:973 - results/BBR060204_cB0304/hybracter_long/processing/pre_polish/BBR060204_cB0304_ignore_list.txt contains no text. No contigs will be ignored
2024-09-10 15:29:15.121 | INFO     | dnaapler.utils.util:end_dnaapler:160 - dnaapler has finished
2024-09-10 15:29:15.121 | INFO     | dnaapler.utils.util:end_dnaapler:161 - Elapsed time: 107.99 seconds

With the following result:

Contig	Gene_Reoriented	Start	Strand	Top_Hit	Top_Hit_Length	Covered_Length	Coverage	Identical_AAs	Identity_Percentage
contig_1	cog1474	1409732	forward	cog1474_gi|341582567|ref|YP_004763059.1|	401	391	97.51	371	94.88

As you can see, when the --db parameter is ommited, the automatic selection is actually able to identify the correct cog1474 gene, as it has a much better Identity_Percentage.

Hi @richardstoeckl ,

Thanks so much for this.

I think I will make a little modification to dnaapler so I can pass repA, dnaa and cog1474 (without terL to avoid reorienting bacteriophages). Then I will update hybracter.

Nonetheless, glad it worked and hopefully it is useful for you and other archaea researchers!

George

Closing this issue, archaea orientation should work in v0.9.0 (@richardstoeckl please test it if you like when it is released/available on conda etc)