BU-ISCIII/buisciii-tools

Kraken save_no_host config saves both unclassified and classified reads and files

Closed this issue · 2 comments

Kraken save_no_host config saves these files:

  • {sample}.classified_1.fastq.gz and {sample}.classified_2.fastq.gz: Reads classified with the database, in viralrecon, host reads (we don't need this file)
  • {sample}.kraken2.classifiedreads.txt: A file with the classification of all the reads, which is very big, similar to one fastq.file (we don't need this file)
  • {sample}.kraken2.report.txt: Normal file saved always with the results (we need this one)
  • {sample}.unclassified_1.fastq.gz and {sample}.unclassified_2.fastq.gz: Reads that didn't classified with the database, in viralrecon, no-host reads (we need this one)

Find a way to remove or exclude .classified_{1,2}.fastq.gz and .kraken2.classifiedreads.txt

In the develop branch of the buisciii tools, the save_nohost.config file no longer exists. In any case, wouldn't the solution be to add the following lines into the viralrecon.config file?

`withName: 'KRAKEN2_KRAKEN2' {
            publishDir = [
                pattern: "*.{unclassified.fastq.gz,unclassified_1.fastq.gz,unclassified_2.fastq.gz,txt}"
            ]
        }`

This would have to be done, however, manually when the researcher asks explicitly for the no host reads. Another approach might be modifying /data/bi/pipelines/nf-core-viralrecon/nf-core-viralrecon-2.6.0/workflow/modules/nf-core/kraken2/kraken2/main.nf, since it currently displays:

`output:
    tuple val(meta), path('*.classified{.,_}*')     , optional:true, emit: classified_reads_fastq
    tuple val(meta), path('*.unclassified{.,_}*')   , optional:true, emit: unclassified_reads_fastq
    tuple val(meta), path('*classifiedreads.txt')   , optional:true, emit: classified_reads_assignment
    tuple val(meta), path('*report.txt')                           , emit: report
    path "versions.yml"                                            , emit: versions`

I believe that lines referring to classified reads and classified.txt could be simply deleted if that's what's being requested in this issue, but some advice on this aspect will be very welcome in any case.

It'd be the first solution you propose I think, but test it just in case. As you mentioned we don't have the config for the no_host output, I think the best approach is if you create a new config as the sars_nanopore one that adds this configuration.

Sounds good?