agshumate/Liftoff

Use liftover with SNPs

Closed this issue · 7 comments

Dear Alaina,

Very interesting tool and approach.
If the VCF file is previously converted to GTF, do you think Liftoff could be used for variants such as SNP/INDEL or even larger deletion/insertion?

Best regards,

Luca

Hi there,
Apologies for the delayed response. Liftoff was not made for this intention, but it may still work with some creativity. If you wanted to lift over a SNP/INDEL, you would have to lift-over the variant with flanking sequence in order to locate it in the target genome with Minimap2. You could create a gff file that has a parent feature which is the flanking region and a child feature which is the actual coordinates of the SNP (see example below). This would allow you to locate region containing the SNP and then the specific coordinates of the SNP itself.

chr source flanking_region start end . strand . ID=flanking_region_1
chr source SNP start end . strand . ID=SNP_1;Parent=flanking_region_1

Hi Alaina,

Great tool for comparing assemblies for non-model organisms! I am looking to use liftoff for converting SNP locations from an old turkey build (2.01) to the most recent one (5.1).

How long of a flanking region do you suggest?

Thanks again,

-Amanda

Hi again @agshumate,

I was able to find from another question that 46 would be the minimum for a flanking region.

Before running through my full dataset, I wanted to get your opinion on the liftoff parameters that should be used when converting SNPs.

My tester gff file is:
1 . region 95754 95954 . + . ID=region1
1 . snp 95854 95854 . + . ID=snp1;Parent=region1

I downloaded the soft-masked toplevel assembly for both genome versions from ensembl
UMD 2.01 (reference)
UMD 5.1 (target)

I use the -f setting to specify the parent feature in my dataset (a text file with just "region")

It will run with and without the -chroms setting, but I have been using it since it should improve accuracy

My output is:
1 Liftoff region 20838 20838 . + . ID=region1;coverage=1.0;sequence_ID=1.0;extra_copy_number=0;copy_num_ID=region1_0
1 Liftoff snp 20838 20838 . + . ID=snp1;Parent=region1;extra_copy_number=0

Like the issue I linked above mentioned, the converted coordinates for the parent region are the same for the SNP. This isn't a problem for me since I am actually only interested in the SNP coordinates (although being able to see the region's converted coordinates would be nice to double-check the SNP location).

My questions for you are:

  1. Are there any other settings I should be using? I tested out using -infer_genes , -infer_transcripts and my results weren't changed. I also tested changing -overlap to 1, thinking that might be necessary for the 100% overlap of the SNP in the flanking region, but again, no change in results. Maybe I am just overthinking it.
  2. Do you have any recommendations for checks to make sure the SNP is being converted correctly?

Hi,
apologies for the response delay. I first want to emphasize that liftoff is designed to liftover genes, not SNPs so the above procedure is a workaround and not well tested. The general procedure for liftoff is to lift only the exons (or other child features) and then use the all of child feature coordinates to infer the parent feature boundaries (transcript or gene). So what is happening here is that only the SNP coordinate is being lifted-over and then the region is inferred by the boundaries of the SNP coordinate which is only a single point. The point of the flanking region is just to provide enough sequence for the alignment step. If it is essential to know the coordinates of the full regions, you can add another "child" feature that has the same coordinates as the region. e.g. 1 . region_child 95754 95954 . + . ID=region1_child;Parent=region1 I hope this helps!

In case this is still relevant for OP or anyone else stumbling here, I'm working on SNPLift, a pipeline that converts the SNP positions of a VCF from one genome version to another. While still private, it is working and complete. If needed, I could make it public early before publication.

Hi @enormandeau ,

Is SNPLift finally out ? I need to lift over snps.

Best,
Kun

Hi. Sorry for the delays. I wanted to finalize a few changes.

Feel free to use SNPLift here: https://github.com/enormandeau/snplift

Take care,

Eric