taxprofiler/taxpasta

Metaphlan3 output contains reads classified as taxid '-1' [BUG]

prototaxites opened this issue · 2 comments

Is there an existing issue for this?

  • I have searched the existing issues

Problem description

Apparently this shouldn't happen! Values for the taxid are either 0 or 100000000 in all cases.

Occured when running taxpasta through nf-core/taxprofiler - I don't have an independent taxpasta installation.

Code sample

nextflow run nf-core/taxprofiler
-resume
-r 1.0.0
-profile scw
--input "inputs/samples.csv"
--databases "inputs/databases.csv"
--preprocessing_qc_tool "falco"
--run_kraken2
--run_metaphlan3
--run_motus
--outdir "results/"

Environment

N/A

Anything else?

No response

The original metaphlan.py code defines the unknown clade as '-1'.

I have tested Metaphlan3 and taxpasta using short-read Illumina data and have obtained counts that are not limited to 0 and 10000000.

A count of 0 indicates that no taxids were identified in the sample, while a count of 10000000 means that only one clade was found in the sample.

However, I am not sure why the maximum count for a single clade is set to 10000000.

The original metaphlan output is in fractions, so 0 for not identified at all or 1 for only one taxon identified. We multiply the fractions by 10^6 in order to obtain integer values. That's why you get those results.