23andMe/yhaplo

Run yhaplo on b38 VCF files

Closed this issue · 1 comments

dwuab commented

I have a .vcf file of Y SNPs aligned to b38 reference genome. However, I notice several files in the input directory are based on b37 coordinates. Any advice on how to deal with b38 Y SNPs? Liftover from b38 to b37 first and then run yhaplo? Or other workflow?
Thanks!

I imagine a b38→b37 LiftOver should do the trick.

Alternatively, it looks like ISOGG lists b38 coordinates for all of these SNPs on the spreadsheet linked from this page: https://isogg.org/tree/ISOGG_YDNA_SNP_Index.html

So you could read in the mapping and replace the b37 coordinate values in you local version of yhaplo's input/isogg.* files. One caveat is that input/isogg.2016.01.04.txt has some formatting issues, as it was copied directly from the ISOGG website at the time. That might make it hard to edit. When yhaplo runs, it cleans and processes this file. So the output file output/isogg.snps.unique.2016.01.04.txt may make for a better starting point.

LiftOver is probably easier, if that works :)