Classification abundances output not showing all ranks

Question

Classification abundances output not showing all ranks

punnettsun opened this issue 4 years ago · 4 comments

Hi again,

I just ran MetaCache on a sample dataset. I used the following command:
./metacache query mydb.db test_genomes/ -out test_results.txt -lowest subspecies -abundances test_abundances.txt
When I took a look at the test_abundances.txt file, I only see rows for domain, family, and subspecies abundances like below:

domain:Bacteria	|	3	|	56893	|	0.123854%
family:Actinomycetaceae	|	101	|	319	|	0.00069445%
subspecies:strain_158623	|	8058	|	1	|	2.17696e-06%
subspecies:strain_287365	|	8174	|	4422	|	0.00962652%
subspecies:strain_627946	|	8239	|	167	|	0.000363552%
subspecies:strain_623479	|	10643	|	246807	|	0.537289%

I do not see species, genus, order, class, or phylum abundances. However, when I ran the above command with -abundance-per phylum, I was able to get the abundances for all the phylum ranks. Then I decided to use -allranks command that was mentioned in the documentation to get all ranks but this always brings me to the help documentation on my command line. I am not sure how to get -allranks working. My main goal is to classify all my reads at all the different ranks. Please let me know how I can do this or if I am making any mistakes here.

Edit: I suppose -allranks only works with -precision and for that I would need the header information set correctly.

Thank you.

Answer 1 · 2020-10-15T12:27:47.000Z

Hi,

Metacache generally shows the lowest possible rank/taxon, i.e., the most precise mapping it can make.
The -lowest <rank> and -highest <rank> options only restrict the range along the taxonomic tree.

If the option -abundance-per <rank> is given, it tries to restrict abundance calculation to one rank only.
This will put a second table below the abundance mapping table (after the line starting with "# estimated abundance ...").
If you want this for a number of different ranks then you could run it in a loop (loading the database only once) with a shell script:

#!/bin/bash
DATABASE="mydb.db"
INPUT="test_genomes/"
OPTIONS=""
COMMANDS=""
for RANK in subspecies species genus family phylum; do
    COMMANDS="${COMMANDS}${INPUT} -abundance-per ${RANK} -abundances abundances_${RANK}.txt -out results_${RANK}.txt ${OPTIONS}\n"
done
# pipe all commands into Metacache running in interactive mode
echo -e $COMMANDS | ./metacache query ${DATABASE}

If you want to calculate/visualize the abundances using some other tools (e.g., Krona) you probably want the full output of all assigned taxon ids on all ranks for each individual read using options -omit-ranks -taxids-only -separate-cols -lineages.

Answer 2 · 2020-10-15T14:56:15.000Z

Great! Thank you for the shell script. It's just what I needed.

Answer 3 · 2020-10-23T22:19:33.000Z

I used the script you provided and am seeing a FAIL message like below. The resulting files do have the contents inside but I do not know why it says file format is not recognized on the terminal.

Answer 4 · 2020-12-01T01:30:36.000Z

This problem was solved after creating the database properly.