Can hickit deal with nanopore long reads and generate phasing results properly?
Wong718 opened this issue · 1 comments
Hello Professor Li. It's a wonderful tool for 3D genome analysis.
Recently, I am dealing with the scNanoHiC data with bwa-sw and hickit, and I found that the haplotype phasing results were confusing. And I want to ask if hickit could deal with the nanopore long sequencing data as properly as NGS.
The sam file generated by bwa-sw is looked like as follows, where a single read may map to multiple positions and generated several records with the same id.
d3d59d85-f117-406d-93e3-4901250df094 0 chr10 118628182 42 344S305M441S * 0 0 GTTCAGTTACGTATTGCTAGCTCTTTCCCTACACGACGCTCTTCCGATCTGAGATTAAAAAAAAAAAAAAAAACATTTTAACCTAGGTGGAAGTGGAGGGAGGAGGGGACGAAGGAGAGAATAAGAAATTTCTGGAGCTTTTAACAAGGGGAGTGTGAGGGTAATCCAGCAATTCAGAAGCCGGGCGCGGTGGCTCATGCCTATAATCCCAGCACTTATTGGGAGGCCGAGGCAGGTGGATAGCTTGAGCCCAAGAGTTCGAGACCACCCTGGCCAACATAGTGAGAACCCCCCATCTCTATTTAAACAACAACAAAAAAGAAATTTGAGAACAACTGCCCCCATAGCTGGGCATGGTGGCACACGCCTGTAATCCCAGCTACTCAGAGGGCTGAGGCAGGAGAATCACTTGGACCCAGGAGGCAGAGGTTGTAGCGAGCCAAGATCATACCACTGCATGCCAGCCTGGGAAGGAGAGTGAGATTCCATCTTGGGGAGGGGGGAGGAACCTTGCAAGGTAGATAACAGTAGCCCCTATTTGGAAGGTGGCACAGCTGGGGTCCAGATAGATGAAGTAACTTGCCCAAGGTCACACAGTTAATAAATGGCAAAGCTTGGATTGGAGCCCACATCTTTTGATTATACCACATGAGCATGGCTTTAGACACGCTGGTGCAAGGATCTGTGTGACCTCTAATCTCACAAGAGTCCTTGCTCAGACCCAGAAAGGGCTTCTCTACAGTATAGGAGAGGAATACCTCCAGGTTGCATGTGGGCAGCTGCCAACGTGAATGGCTTGGTCCTCAGCCTATAGAGCTTAAAGGTATTTTGTATAAGCCTAGTTTCCTCCTGTATAAAAAGGGATAAAACATGAACCCTATGTGGTTGTTGCAGGAGGATGTGAAAGTGCTGCCCCAGTACTTGGTATTAAGAATATCAATAAATCATTAGGACTATGATCTATTTTTAAACAATTTTCAAACAAAGTATTACCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACAAAGACACCAACAACTTTCTTATCTCGTATGCCGTCTTCTGCTTGAGCAATACGTGGG (+,./.--,-+++,/'(&$%&'&(/215982110+*)())*10,-'&&&1%%%(--126;<<?CBAA>;::;;943348644520-,-00000012255/),/-255552+++-5532.,,)&%%&'')...034543423422100122531100.--.2100/0017754240///.22377787656:<@?>>>6666@6666;>=@@?@9940*((*159;?@64448?==<87556.--.8;<?>;;;544499////79993333>:;;;?>@532144/0..26>=;87789<211//0331201101169:8511442213211.//-///34>>91//268:;A=;;;=?>@::954335A>><<==BBAEA@@?A:84//2/:<>>?@@???@>@=><4210,++,10:<>C?>99::>;;:***'***<@?>=<<<710008<;;;;33348;<===<841////0324449----65656653410148<875502.--.135344100/.,,,22....45873224577645443455655566452244:5210033232100011014575554445445410/0.//043335222255111///012457365556200/20/.---/4542101221225543210221...065433457754546524334354210000220.-.///0-2..433233358764444655566543323/-++,,12122.20//005555510/1100023564444776221/0//00335443322225444320,,++002432232000148620//./11332123498779621.-./543212/-..2/--011233743334411103333323442112221112457666587600//23310/0/4//6322223322222200./0123222111233348975543104598434433430/..//02410/.+++,12123241.-./4534347761//0667655433453((()0.--4343//./1,,,,/00///1000241/--./01000''%%% AS:i:299 XS:i:58 XF:i:3 XE:i:1 NM:i:1
d3d59d85-f117-406d-93e3-4901250df094 0 chr2 164815549 39 828S167M95S * 0 0 GTTCAGTTACGTATTGCTAGCTCTTTCCCTACACGACGCTCTTCCGATCTGAGATTAAAAAAAAAAAAAAAAACATTTTAACCTAGGTGGAAGTGGAGGGAGGAGGGGACGAAGGAGAGAATAAGAAATTTCTGGAGCTTTTAACAAGGGGAGTGTGAGGGTAATCCAGCAATTCAGAAGCCGGGCGCGGTGGCTCATGCCTATAATCCCAGCACTTATTGGGAGGCCGAGGCAGGTGGATAGCTTGAGCCCAAGAGTTCGAGACCACCCTGGCCAACATAGTGAGAACCCCCCATCTCTATTTAAACAACAACAAAAAAGAAATTTGAGAACAACTGCCCCCATAGCTGGGCATGGTGGCACACGCCTGTAATCCCAGCTACTCAGAGGGCTGAGGCAGGAGAATCACTTGGACCCAGGAGGCAGAGGTTGTAGCGAGCCAAGATCATACCACTGCATGCCAGCCTGGGAAGGAGAGTGAGATTCCATCTTGGGGAGGGGGGAGGAACCTTGCAAGGTAGATAACAGTAGCCCCTATTTGGAAGGTGGCACAGCTGGGGTCCAGATAGATGAAGTAACTTGCCCAAGGTCACACAGTTAATAAATGGCAAAGCTTGGATTGGAGCCCACATCTTTTGATTATACCACATGAGCATGGCTTTAGACACGCTGGTGCAAGGATCTGTGTGACCTCTAATCTCACAAGAGTCCTTGCTCAGACCCAGAAAGGGCTTCTCTACAGTATAGGAGAGGAATACCTCCAGGTTGCATGTGGGCAGCTGCCAACGTGAATGGCTTGGTCCTCAGCCTATAGAGCTTAAAGGTATTTTGTATAAGCCTAGTTTCCTCCTGTATAAAAAGGGATAAAACATGAACCCTATGTGGTTGTTGCAGGAGGATGTGAAAGTGCTGCCCCAGTACTTGGTATTAAGAATATCAATAAATCATTAGGACTATGATCTATTTTTAAACAATTTTCAAACAAAGTATTACCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACAAAGACACCAACAACTTTCTTATCTCGTATGCCGTCTTCTGCTTGAGCAATACGTGGG (+,./.--,-+++,/'(&$%&'&(/215982110+*)())*10,-'&&&1%%%(--126;<<?CBAA>;::;;943348644520-,-00000012255/),/-255552+++-5532.,,)&%%&'')...034543423422100122531100.--.2100/0017754240///.22377787656:<@?>>>6666@6666;>=@@?@9940*((*159;?@64448?==<87556.--.8;<?>;;;544499////79993333>:;;;?>@532144/0..26>=;87789<211//0331201101169:8511442213211.//-///34>>91//268:;A=;;;=?>@::954335A>><<==BBAEA@@?A:84//2/:<>>?@@???@>@=><4210,++,10:<>C?>99::>;;:***'***<@?>=<<<710008<;;;;33348;<===<841////0324449----65656653410148<875502.--.135344100/.,,,22....45873224577645443455655566452244:5210033232100011014575554445445410/0.//043335222255111///012457365556200/20/.---/4542101221225543210221...065433457754546524334354210000220.-.///0-2..433233358764444655566543323/-++,,12122.20//005555510/1100023564444776221/0//00335443322225444320,,++002432232000148620//./11332123498779621.-./543212/-..2/--011233743334411103333323442112221112457666587600//23310/0/4//6322223322222200./0123222111233348975543104598434433430/..//02410/.+++,12123241.-./4534347761//0667655433453((()0.--4343//./1,,,,/00///1000241/--./01000''%%% AS:i:167 XS:i:0 XF:i:3 XE:i:1 NM:i:0
d3d59d85-f117-406d-93e3-4901250df094 0 chr11 61056073 39 177S167M746S * 0 0 GTTCAGTTACGTATTGCTAGCTCTTTCCCTACACGACGCTCTTCCGATCTGAGATTAAAAAAAAAAAAAAAAACATTTTAACCTAGGTGGAAGTGGAGGGAGGAGGGGACGAAGGAGAGAATAAGAAATTTCTGGAGCTTTTAACAAGGGGAGTGTGAGGGTAATCCAGCAATTCAGAAGCCGGGCGCGGTGGCTCATGCCTATAATCCCAGCACTTATTGGGAGGCCGAGGCAGGTGGATAGCTTGAGCCCAAGAGTTCGAGACCACCCTGGCCAACATAGTGAGAACCCCCCATCTCTATTTAAACAACAACAAAAAAGAAATTTGAGAACAACTGCCCCCATAGCTGGGCATGGTGGCACACGCCTGTAATCCCAGCTACTCAGAGGGCTGAGGCAGGAGAATCACTTGGACCCAGGAGGCAGAGGTTGTAGCGAGCCAAGATCATACCACTGCATGCCAGCCTGGGAAGGAGAGTGAGATTCCATCTTGGGGAGGGGGGAGGAACCTTGCAAGGTAGATAACAGTAGCCCCTATTTGGAAGGTGGCACAGCTGGGGTCCAGATAGATGAAGTAACTTGCCCAAGGTCACACAGTTAATAAATGGCAAAGCTTGGATTGGAGCCCACATCTTTTGATTATACCACATGAGCATGGCTTTAGACACGCTGGTGCAAGGATCTGTGTGACCTCTAATCTCACAAGAGTCCTTGCTCAGACCCAGAAAGGGCTTCTCTACAGTATAGGAGAGGAATACCTCCAGGTTGCATGTGGGCAGCTGCCAACGTGAATGGCTTGGTCCTCAGCCTATAGAGCTTAAAGGTATTTTGTATAAGCCTAGTTTCCTCCTGTATAAAAAGGGATAAAACATGAACCCTATGTGGTTGTTGCAGGAGGATGTGAAAGTGCTGCCCCAGTACTTGGTATTAAGAATATCAATAAATCATTAGGACTATGATCTATTTTTAAACAATTTTCAAACAAAGTATTACCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACAAAGACACCAACAACTTTCTTATCTCGTATGCCGTCTTCTGCTTGAGCAATACGTGGG (+,./.--,-+++,/'(&$%&'&(/215982110+*)())*10,-'&&&1%%%(--126;<<?CBAA>;::;;943348644520-,-00000012255/),/-255552+++-5532.,,)&%%&'')...034543423422100122531100.--.2100/0017754240///.22377787656:<@?>>>6666@6666;>=@@?@9940*((*159;?@64448?==<87556.--.8;<?>;;;544499////79993333>:;;;?>@532144/0..26>=;87789<211//0331201101169:8511442213211.//-///34>>91//268:;A=;;;=?>@::954335A>><<==BBAEA@@?A:84//2/:<>>?@@???@>@=><4210,++,10:<>C?>99::>;;:***'***<@?>=<<<710008<;;;;33348;<===<841////0324449----65656653410148<875502.--.135344100/.,,,22....45873224577645443455655566452244:5210033232100011014575554445445410/0.//043335222255111///012457365556200/20/.---/4542101221225543210221...065433457754546524334354210000220.-.///0-2..433233358764444655566543323/-++,,12122.20//005555510/1100023564444776221/0//00335443322225444320,,++002432232000148620//./11332123498779621.-./543212/-..2/--011233743334411103333323442112221112457666587600//23310/0/4//6322223322222200./0123222111233348975543104598434433430/..//02410/.+++,12123241.-./4534347761//0667655433453((()0.--4343//./1,,,,/00///1000241/--./01000''%%% AS:i:161 XS:i:0 XF:i:3 XE:i:1 NM:i:1
Then, I applyed the sam2seg function to generated the .seg file as follows, by providing the corresponding .vcf file with -v parameter.
0a0a0f39-8470-464b-aea8-ae41b7967128 chrX!57850012!57850118!+!.!32!1 chrX!118125267!118125631!+!.!47!1 chrX!118200629!118201192!-!.!49!1
0a0a6e33-7628-475a-a306-12ba28ca555d chr15!94570904!94571063!-!.!36!1 chr15!97971692!97972740!-!1!54!1
0a0a7a06-e53d-4c7a-8675-ff8bf1b74ec3 chr7!7146551!7147017!-!.!48!1 chr10!89043022!89043154!-!1!35!1 chr7!7143778!7144116!+!.!42!1
However, after I generated the .pairs file with modifed seg2pairs function (I have modified this function to generate multi-contact for one read), I observed that the trans-parental contacts in the same chromosome were more than expected, and I want to figure out why.
In general, I want to ask if the hickit::sam2seg functon could deal with the .sam files generated by bwa-sw and make correct phasing decision, and what will sam2seg do if a long mapped read has SNPs derived from opposite phases.