genome/bam-readcount

how to get the site list for indels?

Opened this issue · 3 comments

Hi,we want to run bam-readcount to obtain a file of readcounts for our indels. But we don't know how to define the site list for indels, or how to get the site list from the vcf file from the same sample.

Indels will be reported at the appropriate base as an additional column (for example A:xxxx C:xxxx G:xxxx T:xxxx +A:xxxx). it's straightforward to add these. If you want to add readcounts to your VCF directly, consider looking at the scripts in VAtools as outlined here: https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html

Hope this helps!

Thanks, but we can not solve this, if we have a vcf file like this:
CHROM POS REF ALT
chr1 1168012 CCTG C
chr1 1356341 TTCC T
chr1 1534913 CGCG C
chr1 1684347 CCCT C
chr1 1684347 CCCT CCCTCCT
how to get the site list from this vcf file? Because some indels are compliacted.

Complex indels (like the last one) are not currently well-supported by bam-readcount. The others should be fine, though. if in doubt, pick one, run bam readcount on a small interval around the event and look for the indel to match up as as sanity check.