stjude/indelPost

How to construct a complex indel from SNVs and simple indels

Closed this issue · 6 comments

Hi, I read your paper, this repo, and the doc at https://indelpost.readthedocs.io/en/latest/api.html and I still do not know how to construct a complex indel from simple variants. For example, from a VCF file generated by Mutect/Strelka/UVC as input, how do I get a new VCF file with small variants fused into complex indels.

Thanks for your interest. This library expects you to write your own scripts to construct complex indels from simple indel inputs. For example, please see:
https://indelpost.readthedocs.io/en/latest/examples.html#annotating-complex-indels-from-simple-indels
This example works with a table format. However, an input VCF can easily be parsed to a table. I'd recommend appending the complex allele representation to INFO field.

Currently, this library constructs a complex indel from a simple indel, not from SNV.

Please let me know if you need more info. Or, please contact me if you need a sample script with a sample dataset.

Hi @rawagiha
Thank you for your quick reply. I tried using your script and got the following error as shown in the picture.
image
Is it because I was trying to combine two SNVs with one InDel (as hinted by you in your last reply)?

Hi @genetronhealth

No, this is my fault!!

Please apply phase(how="complex") to valn as fixed in the doc.
v_cplx = valn.phase(how="complex") ## not apply to v
https://indelpost.readthedocs.io/en/latest/examples.html#annotating-complex-indels-from-simple-indels
image

My apologies about this.

By the way, inputting SNV won't throw an error. Applying"phase" to VariantAlignment object representing a SNV returns a NullVariant object (https://indelpost.readthedocs.io/en/latest/api.html#indelpost.NullVariant).

Additional note on constructing complex indels:

In actual practice, we do not know in advance which SNVs and indels to combine. Or, the variant caller may not report all members of the target complex indel. Further, SNVs and Indels may be reported in two separate VCF files. For these reasons, indelPost only requires one indel to construct the target complex indel. It will search other members in the BAM file. In your case, the deletion at 55242469 is enough for construction. The other SNVs will be automatically searched. If the input indel is simple, the phase method returns the input indel as it is.

Hi @rawagiha

Thank you for your information. I got it working. I noticed a small typo in the doc at https://indelpost.readthedocs.io/en/latest/examples.html#annotating-complex-indels-from-simple-indels

The line of code
return v_cplx.pos, v_cplx.ref. v_cplx.alt
should be
return v_cplx.pos, v_cplx.ref, v_cplx.alt # comma instead of period

@genetronhealth

Wow! Thanks a lot for finding another typo (will fix it!).
I will close the issue for now.
Please let me know for any question you may have.