invalid Kingdom taxonomic_rank option
Closed this issue · 3 comments
Operating System
Other Linux (please specify below)
Other Linux
No response
Workflow Version
v2.9.4
Workflow Execution
Command line (Local)
Other workflow execution
No response
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-metagenomics --fastq ${path2fastq}
--classifier kraken2 --analyse_unclassified --database_set ${db}
--taxonomic_rank k --include_kraken2_assignments
--out_dir out_${db}/ --threads 16 1>wf_${db}.out 2>wf_${db}.err
Workflow Execution - CLI Execution Profile
None
What happened?
wf-metagenomics requires taxonomic_rank
to be set to k
for kingdom, however, in the later step during executing est_abundance.py
, it requires upper case k K
instead, which leads to an inconsistency error and abort the whole pipeline. This should be a typo within the workflow argument parsing step and should change lower case k
to upper case K
instead to comply to subsequent commands.
Relevant log output
Caused by:
Process `kraken_pipeline:run_bracken (3502253)` terminated with an error exit status (1)
Command executed:
# run bracken on the latest kreports, is this writing some outputs
# alongside the inputs? seems at least {}.kreport_bracken_species.txt
# is written alongside the input
BRACKEN_LENGTH=$(cat bracken_length.txt)
workflow-glue run_bracken "ncbi_16s_18s_28s_ITS_kraken2_db" kraken2.report $BRACKEN_LENGTH "k" "3502253.kraken2_bracken.report"
# do some stuff...
awk -F ' ' -v OFS=' ' '{ print $2,$6 }' "3502253.kraken2_bracken.report" | awk -F ' ' -v OFS=' ' 'NR!=1 {print}' | tee taxacounts.txt | awk -F ' ' -v OFS=' ' '{ print $1 }' > taxa.txt
taxonkit lineage -j 14 --data-dir taxdmp_2023-01-01_db -R taxa.txt > lineages.txt
workflow-glue aggregate_lineages_bracken -i "lineages.txt" -b "taxacounts.txt" -u kraken2.report -p "3502253.kraken2" -r "k"
# add sample to the json file
file1=$(find -name '*.json' -exec cat {} +)
echo "{"'"3502253"'": $file1}" >> "bracken.json"
mv "bracken.json" "3502253.json"
Command exit status:
1
Command output:
b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i kraken2.report -o 3502253.kraken2_bracken.report -k ncbi_16s_18s_28s_ITS_kraken2_db/database1000mers.kmer_distrib -l k -t 0\nPROGRAM START TIME: 05-20-2024 06:12:15\n'
Command error:
[06:12:14 - matplotlib.font_manager] generated new fontManager
[06:12:15 - workflow_glue] Starting entrypoint.
b'Traceback (most recent call last):\n File "/home/epi2melabs/conda/bin/src/est_abundance.py", line 554, in <module>\n main()\n File "/home/epi2melabs/conda/bin/src/est_abundance.py", line 245, in main\n branch_lvl = main_lvls.index(args.level[0])\nValueError: \'k\' is not in list\n'b' >> Checking for Valid Options...\n >> Running Bracken \n >> python src/est_abundance.py -i kraken2.report -o 3502253.kraken2_bracken.report -k ncbi_16s_18s_28s_ITS_kraken2_db/database1000mers.kmer_distrib -l k -t 0\nPROGRAM START TIME: 05-20-2024 06:12:15\n'
Work dir:
/mnt/nano/amplicon_directory/Output_epi2me/DMc23-8545/wd_ncbi_16s_18s_28s_ITS_kingdom/work/05/7d09db8867bd84e1442308f8f7d93f
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-20 16:12:17.812 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `kraken_pipeline:run_bracken (3502253)` terminated with an error exit status (1)
May-20 16:12:17.830 [Task monitor] DEBUG nextflow.Session - The following nodes are still active:
[process] kraken_pipeline:createAbundanceTables
status=ACTIVE
port 0: (value) OPEN ; channel: lineages/*
port 1: (value) bound ; channel: taxonomic_rank
port 2: (value) bound ; channel: pipeline
port 3: (cntrl) - ; channel: $
[process] kraken_pipeline:makeReport
status=ACTIVE
port 0: (value) bound ; channel: read_stats/per-read-stats*.tsv.gz
port 1: (value) OPEN ; channel: abundance_table
port 2: (value) OPEN ; channel: lineages/*
port 3: (value) bound ; channel: versions/*
port 4: (value) bound ; channel: params.json
port 5: (value) bound ; channel: taxonomic_rank
port 6: (queue) OPEN ; channel: amr
port 7: (cntrl) - ; channel: $
[process] kraken_pipeline:output_results
status=ACTIVE
port 0: (queue) OPEN ; channel: -
port 1: (cntrl) - ; channel: $
May-20 16:12:18.777 [main] DEBUG nextflow.Session - Session await > all processes finished
May-20 16:12:18.778 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-20 16:12:18.779 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-20 16:12:19.103 [main] WARN n.processor.TaskPollingMonitor - Killing running tasks (2)
May-20 16:12:19.152 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=445; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=435; submittedCount=1; runningCount=0; retriesCount=0; abortedCount=1; succeedDuration=50m 38s; failedDuration=6m 21s; cachedDuration=0ms;loadCpus=-14; loadMemory=0; peakRunning=12; peakCpus=28; peakMemory=24 GB; ]
May-20 16:12:19.152 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
May-20 16:12:19.161 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
May-20 16:12:21.052 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
May-20 16:12:21.235 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-20 16:12:21.280 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
No response
Hi @RunpengLuo
Thank you for using the workflow!
I'll review what you have mentioned and write you back
Thank you very much!
Hi @RunpengLuo ,
I've been looking into this. I have found a fix for the minimap2 pipeline. However, for the kraken approach is a bit more tricky as Bacteria and Archaea doesn't have Kingdom level, and that may interfere with how bracken estimate abundances.
I'm doing further investigations to think the best way to approach this, but meanwhile is there any specific reason why you want to run the analysis at the Kingdom level?
Thank you very much in advance!
Hi, I have raised this issue internally, so I close this meanwhile.
Thank you very much for using the workflow and for the feedback!