Regarding filtering of Low Frequency and Subclonal Mutations
Rohit-Satyam opened this issue · 4 comments
What happened?
I tried to assign Variant Allele Frequency on the wf-artic VCF files using vafator
and I realize that there were few Subclonal and Low frequency Variants ( VAF < 20 % were considered LOW_FREQUENCY and variants with a VAF >= 20 % and < 80 % are considered SUBCLONAL). I wish to understand if they are retained or filtered before consensus FASTA assembly generation?
If not filter, do you suggest to filter them before submitting to GISAID. If yes, would you include this feature to filter out anything lower than 0.8 (or 80%).
Operating System
ubuntu 20.04
Workflow Execution
Command line
Workflow Execution - EPI2ME Labs Versions
Workflow Version
0.3.18
@cjw85 @mattdmem @sarahjeeeze Do you guys have any thoughts on filtering of the variants for generating consensus fasta?
Referencing a related issue here
Currently we implement the field bioinformatics package from the Artic Network. We'll take this feedback into account for future versions of the workflow.
@Rohit-Satyam I'd think you should report the subclonal sites as ambiguous if they are above a threshold of coverage - otherwise make them N. You should definitely not make them reference (reversions to reference are a very common artefact and screw up phylogenetics in a bad way).
I'm not sure what you mean by filter. Filtering them out entirely might cause the site to be output as reference which would be bad.