Classification abundances output not showing all ranks
punnettsun opened this issue · 4 comments
Hi again,
I just ran MetaCache on a sample dataset. I used the following command:
./metacache query mydb.db test_genomes/ -out test_results.txt -lowest subspecies -abundances test_abundances.txt
When I took a look at the test_abundances.txt file, I only see rows for domain, family, and subspecies abundances like below:
domain:Bacteria | 3 | 56893 | 0.123854%
family:Actinomycetaceae | 101 | 319 | 0.00069445%
subspecies:strain_158623 | 8058 | 1 | 2.17696e-06%
subspecies:strain_287365 | 8174 | 4422 | 0.00962652%
subspecies:strain_627946 | 8239 | 167 | 0.000363552%
subspecies:strain_623479 | 10643 | 246807 | 0.537289%
I do not see species, genus, order, class, or phylum abundances. However, when I ran the above command with -abundance-per phylum
, I was able to get the abundances for all the phylum ranks. Then I decided to use -allranks
command that was mentioned in the documentation to get all ranks but this always brings me to the help documentation on my command line. I am not sure how to get -allranks
working. My main goal is to classify all my reads at all the different ranks. Please let me know how I can do this or if I am making any mistakes here.
Edit: I suppose -allranks only works with -precision and for that I would need the header information set correctly.
Thank you.
Hi,
Metacache generally shows the lowest possible rank/taxon, i.e., the most precise mapping it can make.
The -lowest <rank>
and -highest <rank>
options only restrict the range along the taxonomic tree.
If the option -abundance-per <rank>
is given, it tries to restrict abundance calculation to one rank only.
This will put a second table below the abundance mapping table (after the line starting with "# estimated abundance ...").
If you want this for a number of different ranks then you could run it in a loop (loading the database only once) with a shell script:
#!/bin/bash
DATABASE="mydb.db"
INPUT="test_genomes/"
OPTIONS=""
COMMANDS=""
for RANK in subspecies species genus family phylum; do
COMMANDS="${COMMANDS}${INPUT} -abundance-per ${RANK} -abundances abundances_${RANK}.txt -out results_${RANK}.txt ${OPTIONS}\n"
done
# pipe all commands into Metacache running in interactive mode
echo -e $COMMANDS | ./metacache query ${DATABASE}
If you want to calculate/visualize the abundances using some other tools (e.g., Krona) you probably want the full output of all assigned taxon ids on all ranks for each individual read using options -omit-ranks -taxids-only -separate-cols -lineages
.
Great! Thank you for the shell script. It's just what I needed.
This problem was solved after creating the database properly.