epi2me-labs/wf-artic

Regarding filtering of Low Frequency and Subclonal Mutations

Rohit-Satyam opened this issue · 4 comments

What happened?

I tried to assign Variant Allele Frequency on the wf-artic VCF files using vafator and I realize that there were few Subclonal and Low frequency Variants ( VAF < 20 % were considered LOW_FREQUENCY and variants with a VAF >= 20 % and < 80 % are considered SUBCLONAL). I wish to understand if they are retained or filtered before consensus FASTA assembly generation?

If not filter, do you suggest to filter them before submitting to GISAID. If yes, would you include this feature to filter out anything lower than 0.8 (or 80%).

Operating System

ubuntu 20.04

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

Workflow Version

0.3.18

@cjw85 @mattdmem @sarahjeeeze Do you guys have any thoughts on filtering of the variants for generating consensus fasta?

Referencing a related issue here

Currently we implement the field bioinformatics package from the Artic Network. We'll take this feedback into account for future versions of the workflow.

@Rohit-Satyam I'd think you should report the subclonal sites as ambiguous if they are above a threshold of coverage - otherwise make them N. You should definitely not make them reference (reversions to reference are a very common artefact and screw up phylogenetics in a bad way).

I'm not sure what you mean by filter. Filtering them out entirely might cause the site to be output as reference which would be bad.