single-cell-genetics/cellsnp-lite

minimal minor allele count filter?

chilampoon opened this issue · 2 comments

Hello there -

thanks for developing this great and fast tool. I wonder if there's a filter for the minimal count of minor allele? I only found --minMAF argument. Because for some of my samples, many pileup positions have only 1 minor allele and they seem to be sequencing/alignment errors or some weird noise, and also I don't want to set the minMAF to be too high as it may throw away many low-frequency signals.
If there is an option to set the minimal number of minor allele counts it'd be very helpful, thanks in advance.

hxj5 commented

Hi, cellsnp-lite does not have this option. For now, it has to be done by post-hoc analysis. We initialized a notebook scripts/post_hoc/subset_with_minAD_issue108.ipynb, which could be a starting point for this task.
This notebook can also serve as an example to show how to load cellsnp-lite output, subset SNPs, and save the subset data (commit 83c102f).

thank you so much @hxj5 ! Yes the filtering can be done post-hoc, yet it'd be also nice to include minor allele count as a potential argument to reduce some processing time. For instance one of my cellsnp.base.vcf is ~3GB large as it contains many mismatches with only 1 minor allele count, and after filtering the file size goes down to ~300MB