Kraken2 paired read mode results cannot be parsed

Question

Kraken2 paired read mode results cannot be parsed

Opened this issue 5 years ago · 0 comments

filter_classified_reads version: 0.1.0
Python version: 3.6.7
Operating System: Ubuntu 16.04

Description

filter_classified_reads cannot parse Kraken2 results from paired-end reads with the --paired flag.

What I Did

Kraken2 was run with paired-end reads with the --paired flag producing classification results that could not be parsed by filter_classified_reads:

$ filter_classified_reads -i reads_1.fastp.fastq.gz -I reads_2.fastp.fastq.gz \
    -o reads_1.viral_unclassified.fastq -O reads_2.viral_unclassified.fastq \
    -c reads-centrifuge_results.tsv -C reads-kreport.tsv \
    -k reads-kraken2_results.tsv -K reads-kraken2_report.tsv \
    --taxids 10239
...
2019-09-05 13:52:49,389 INFO: Parsing kraken2 results into DataFrame [i
n target_classified_reads.py:49]
Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1191, in pandas._libs.parsers.T
extReader._convert_tokens
TypeError: Cannot cast array from dtype('O') to dtype('uint16') accordi
ng to the rule 'safe'

Snippet from Kraken2 classification results output file:

U	M04594:80:000000000-G37TN:1:1101:16267:2330	0	149|151	0:115 |:| 0:117
U	M04594:80:000000000-G37TN:1:1101:11949:2338	0	150|151	0:116 |:| 0:117
C	M04594:80:000000000-G37TN:1:1101:14888:2339	9606	151|151	0:52 9606:1 0:64 |:| 0:117

filter_classified_reads is expecting certain data types within certain fields like uint16 in the sequence length field (4th field).