Joon-Klaps/viralgenie

Difference in depth bcftools mpileup vs samtools mpileup

Closed this issue · 2 comments

Description of the bug

There is a difference in variant & read depth using the default settings of bcftools mpileup and samtools mpileup. This results in different consensus called where bcftools is much more likely to carry the same variants across iterations when they should have been added to the consensus.

It needs to be understood where this difference is coming from.

config:

withName: IVAR_CONTIG_CONSENSUS {
                ext.args = [
                    '-t 0.51',  // frequency to call consensus: 0.51 just the majority rule
                    '-q 0',     // minimum quality score : 0 (no quality is provided)
                    '-m 1',     // minimum depth to call consensus
                    '-n N'      // Characters to print in regions with less coverage
                ].join(' ').trim()
                ext.args2 = '--count-orphans --max-depth 0 --min-BQ 0 --no-BAQ -aa'
                }
withName: BCFTOOLS_MPILEUP {
            ext.args = '--ignore-overlaps --count-orphans --max-depth 0 --min-BQ 20 --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR'
            ext.args2 = '--ploidy 1 --keep-alts --keep-masked-ref --multiallelic-caller --variants-only'
            ext.args3 = "--include 'INFO/DP>=10'"
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Hi all,
I noticed that the behaviour of --max-depth 0 in bcftools mpileup and samtools mpileup is actually very different and leads to considerably different consensus genomes by IVAR consensus and bcftools consensus
In both cases --max-depth defines the number of reads it start’s to consider at each base location. This can significantly speed up the mpileup process. From the docs of bcftools mpileup:

-d, --max-depth INT
    At a position, read maximally INT reads per input file. Note that the original samtools mpileup
    command had a minimum value of 8000/n where n was the number of input files given to mpileup.
    This means that in samtools mpileup the default was highly likely to be increased and the -d parameter
    would have an effect only once above the cross-sample minimum of 8000. This behavior was problematic
    when working with a combination of single- and multi-sample bams, therefore in bcftools mpileup the user
    is given the full control (and responsibility), and an informative message is printed instead [250] 

In samtools mpileup --max-depth 0 removes the depth limit. However, in bcftools mpileup it sets the maximum depth to 1 (check this, look at the mpileup file and take a md5sum after removing the comment containing the command used, both checksums will be the same).
So my question becomes, does anyone know how to remove this depth limit in bcftools mpileup if it’s not possible, what’s a good default value then to use? Is it 8 000 or 80 000 or ... ?

After removing the commandline argument in the file:

md5sum File
9dbc53c85a82851300eeb0c3993cdd87 LVE00140_LVE00140.ivar_constrain.org.mpileup-pip-default
9dbc53c85a82851300eeb0c3993cdd87 LVE00140_LVE00140.ivar_constrain.org.mpileup-depth-1
9aa812883c3284d4352a3d10f42cc537 LVE00140_LVE00140.ivar_constrain.org.mpileup-depth-2