rwdavies/QUILT

Recovering recombination/Haplotype switch locations

karstyl opened this issue · 5 comments

Hello,

Is there a way to have the positions where QUILT has decided to switch from one haplotype in the reference set to another? And to tell which of the reference haplotypes are chosen?

My data set is structured in a way where I can, for a portion of it, impute from direct parents. Having the data on position and frequency of the haplotype switches would be very helpful, both for quality checking and for recombination frequency mapping.

Thanks,
Jen

Hey,

In principle yes though there would be some hacking involved.

What QUILT does is impute a posterior state probability matrix where rows are haplotypes and columns are grids (groups of SNPs). From that, imputed dosages are calculated by multiplying those copying probabilities by whether a haplotype contains the reference or the alternate allele (within the grids). As such, you could infer switching from one haplotype to another in the reference by looking for rapid changes in the posterior state probability matrix. It would be possible to output this matrix.

Various QUILT plots will output some flavour of this. You can turn make_plots on and you'll see a version of this posterior state probability matrix along the region that is being imputed. I can't remember if the haplotypes are labelled with respect to their position in the haplotype reference panel, or otherwise labelled, e.g. with respect to the subset of the haplotype reference panel used in the iterative imputation process (I suspect the later but can't remember).

When you say your data set is structured such that you can impute from direct parents, do you mean for a subset of individuals, or somehow specifically in some regions of the genome and not others (in which case I assume the data is non-human!)

Thanks
Robbie

I will try to turn on plots and see what I can recover.
I have a set of animals (foxes) that are deeply sequenced that I used to create a reference panel, and their descendants that I have low coverage sequencing for. I also have animals from the same population that are not direct descendants with low coverage sequencing. I am hoping to use the animals in the pedigrees to also confirm some linkage and recombination frequencies, so if that data was easy to get from the imputation it would be helpful, but I can also use other tools.

.....
[2024-04-10 13:27:02] downsample sample F10F236 - 37124 of 854061 reads removed
[2024-04-10 13:27:07] The average depth of this sample is:1.95722271290082
[2024-04-10 13:27:07] There are 816937 reads under consideration
[2024-04-10 13:27:08] i_gibbs=1, i_it = 1 small gibbs
Error in x[1:(length(x) - n)] :
only 0's may be mixed with negative subscripts
Calls: QUILT ... plot_single_gamma_dosage -> plot_1_dosage_vs_truth -> smooth_vector
Execution halted

@karstyl Hi, sorry, there is a bug in plotting this. But if you have truth data, you can specify the --phasefile, which should work. The bug will be fixed soon.

I think I fixed it, can you try re-installing from source, and see if that works for you? Thanks, Robbie