Illumina/paragraph

How to merge multi-samples SVs and obtain breakpoints for genotyping a population

Zhiliang-Zhang opened this issue · 0 comments

Hi @pkrusche @traxexx @egor-dolzhenko

Thanks for your contribution. I obtained multi-sample SV sets from pan-genome. Graph-population genotyping is not a good choice for my large plant genome, because the calculation is very large. Paragraph is an accurate genotyper for population short-read sequencing data to further genotype the SVs that had been mined. As we all know, inaccurate breakpoints will result in a bad genotyping performance, although genotyping with breakpoint deviations (1-18bp) also offers a nice performance (~0.9). But, I still have some questions as below.

An example:

Chr1 | 101508 | 101750 | sample1
Chr1 | 101510 | 101770 | sample2
Chr1 | 101510 | 101771 | sample3
Chr1 | 101512 | 101773 | sample4
Chr1 | 101512 | 101776 | sample5, sample14
Chr1 | 101510 | 101777 | sample6
Chr1 | 101514 | 101779 | sample7, sample16
Chr1 | 101515 | 101780 | sample8
Chr1 | 101515 | 101781 | sample9, sample15,sample17
Chr1 | 101515 | 101784 | sample10
Chr1 | 101515 | 101785 | sample11
Chr1 | 101515 | 101786 | sample12
Chr1 | 101518 | 101789 | sample13

Left breakpoint is in 101508-101518(11bp), and right is in 101750-101789(40bp). How to obtain certain breakpoints for genotyping a population to achieve relatively high performance?

Could you give me some advice? Thanks!

Sincerely,
Zhiliang