Chromosome-length haplotype determination from Hi-C and external reference panels
refLinker resolves complete chromosomal haplotypes using Hi-C data and statistical phasing methods. For detailed installation instructions, see https://github.com/rwtourdot/mlinker.
refLinker performs haplotype refinement on an initial haplotype "guess," here, the statistical phasing haplotype generated by EAGLE2 (https://alkesgroup.broadinstitute.org/Eagle/).
Starting from a statistical phasing haplotype, {statistical haplotype}.vcf.gz
and aligned Hi-C reads, {Hi-C reads}.bam
, extracts Hi-C links between variant genotypes for a single chromosome with the following command:
linker extract -v {statistical haplotype}.vcf.gz -i {Hi-C reads}.bam -n {sample name} -e hic -c {chrom}
This generates map from reads to variants, written to the file graph_variant_{sample name}_{chrom}.dat
Chromosome-length haplotype inference is then performed using the reads-to-variants map and statistical phasing haplotype with the following command:
linker pop -v {statistical haplotype}.vcf.gz -g graph_variant_{sample name}_{chrom}.dat -c {chrom} -e -10.0 -p 0.999
Where the arguments -e
and -p
specify the block-flipping penalty and linkage pruning cut-offs respectively. reflinker pop
writes whole-chromosome haplotype to the file pop_hap_solution_{sample name}_{chrom}.dat
, where column 1 corresponds to variant position, column 4 corresponds to refLinker haplotype, and column 5 corresponds to the initial (e.g. EAGLE2) haplotype assignment.
The python script eagle2_recover.py
calculates the haplotype linkage between common, unlinked variants omitted from the refLinker Hi-C scaffold, based on their EAGLE2 linkage to the closest linked variant within 5 kb.