leylabmpi/Struo2

Improper visualisation of Kraken2 output after using Struo2 Kraken database files

Closed this issue · 4 comments

Hi!!

I was trying to classify a sample using Kraken Indexes provided by you here since using Kraken2 PlusPF database was leading to ~74% of unclassified reads. Your database improved this classification and reduced unclassified reads to ~48%. However, when visualising Kraken2 report using Pavian R package, I am getting improper visualization as shown below:

image

You can see the prefixes "p__, f__, s__" which usually don't come when using Kraken2 Refseq Indexes. I can remove them from the report but I am worried and wish to ask if I did something wrong?

My code is as follows:

kraken2 --threads 40 --paired --db /metagenomics/struo2_krakendb sample_R1_pair.fastq.gz sample_R2_pair.fastq.gz --output sample --report sample_kraken_report.txt --gzip-compressed --unclassified-out $sample_unclassified#.fastq

bracken -r 150 -d /metagenomics/struo2_krakendb -i sample_kraken_report.txt -o sample_bracken_report.txt -w sample_kraken_bracken.report

If you want to filter out the prefixes in the GTDB taxonomic classifications, then you can use a regular expression such as [dpcofgs]__. Given how variable taxonomic classification names can be, it can be useful to use a more elaborate regex (e.g., (^|;|\t)[dpcofgs]__)

Thanks for the regex. I have one more query. I have first classified the reads using Kraken2 pluspf indexes and the unclassified reads were then classified using Struo2 Krakenn2 database. Now how do I combine the two Kraken2 reports. KrakenTools isn't helping and throws an error:

../../../metagenomics/KrakenTools/combine_kreports.py -r sample_kraken_report.txt sample_kraken_report_sturo.txt -o combined_report.txt
>>STEP 1: READING REPORTS
        2/2 samples processedTraceback (most recent call last):
  File "../../../metagenomics/KrakenTools/combine_kreports.py", line 311, in <module>
    main()
  File "../../../metagenomics/KrakenTools/combine_kreports.py", line 226, in main
    while level_num != (prev_node.level_num + 1):
AttributeError: 'NoneType' object has no attribute 'level_num'

Now how do I combine the two Kraken2 reports. KrakenTools isn't helping and throws an error

I wouldn't combine them, given that you would be combining 2 different taxonomies.

Okay. Then I will plot them separately!! Thanks for your prompt reply.