Question about relative position of peaks in trio hapmers plot

Question

Question about relative position of peaks in trio hapmers plot

Closed this issue a month ago · 2 comments

Hello,

I'm working on a trio-binned assembly (haploid but including both sex chromosomes), and am using the merqury hapmers.sh tool to get some insight on the process. The resulting plot has the expected read-only, maternal, paternal, and shared distribution, but the pattern is different than the provided example plot, which shows the single peaks for both parentals and the shared diploid peak perfectly overlapped. In the version from my data the coverage position and heights for these peaks are all variable:

How would you interpret this difference? Is it just because we have a haploid+XY vs fully phased diploid assembly, or variable read depths? Or does this point to deeper problems in the trio-binning process?

Here are the submission script and output files, jic.
merqury-trio_test.stderr.txt
merqury-trio_test.stdout.txt
merqury-trio_test.slurm.txt

Thank you for your insight!

Answer 1 · 2024-08-30T18:11:31.000Z

Hello @dluecke ,

I see you figured out the plotting problem from the plot attached in your email.

I am afraid that It looks like a tetraploid (or higher?) genome to me. Do you have genomescope results from your F1's kmers? If the 1-copy peak is at 14x, the 2nd peak may be ~22x, 3rd is not so visible (which often happens), and 4th at ~40x.
The peaks are not clearly distinguishable, potentially due to low sequencing coverage.
It's possible to have shared kmers in 1-copy range, it's possible to have such a thing in the child seen once but present in both parents. The X chromosome is one good example, imagine the maternal had XX and XY, but only X from the maternal genome got passed down along with the paternal Y.
Likewise, it's possible to have hapmers seen in the 2-copy region. This could be coming from haplotype specific duplications, which is not present in the other haplotype.
The high amount of read-only kmers around 10x seems also suspicious; could there be some contaminents?

Answer 2 · 2024-09-04T16:07:25.000Z

Thank you for the response, good to know my interpretation wasn't totally wrong even if that's bad news for our data quality. We will re-sequence from fresh samples.
best,
David