vibansal/HapCUT2

Obtaining Phased Haplotypes

Opened this issue · 5 comments

Input: PacBio long read, HiC and Illumina short read data
Assembly: Canu v2.1.1 and then run my assembly through purgeHaplotigs
Variants: FreeBayes

I process HiC and PacBio files as recommended in HiC_longread recipe. My question is what to do next to get a phased haplotype FASTA file? How do I know which blocks belong together?

Thank you.

You should be able to use bcftools consensus (http://samtools.github.io/bcftools/bcftools.html#consensus) to generate fasta files for each haplotype. The output vcf file has an identifier for each phased variant specifiying which block it belongs to.

@vibansal I think I am explaining it wrong. Since each contig is processed in parallel. How does HapCut2 know which blocks within a contig belong together?

Lets say the draft assembly has 4 contigs representing two copies of a chromosome. Since HapCut2 analysed each contig in parallel, how does it know which blocks (of a contig) belong together? How does it provide a recipe to create the two copies correctly, especially in terms of ordering of blocks in the chromosome? I understand the phasing bit, I think.

Sorry for the confusing question.

Hapcut2 is designed to reconstruct haplotypes for a diploid genome using reads mapped to a haploid consensus. For each group of variants that can be linked together by the reads, it outputs two haplotype sequences at heterozygous variant sites. I don't understand your objective completely but I don't think that HapCUT2 is designed to do that.

You should be able to use bcftools consensus (http://samtools.github.io/bcftools/bcftools.html#consensus) to generate fasta files for each haplotype. The output vcf file has an identifier for each phased variant specifiying which block it belongs to.

Hi,
I tried to get consensus sequence with vcf and noticed that some SNP with allele type 1/2 in phased blocks are converted to 0 or 1 in fasta, it seems 2 is not included in .vcf ?
Thank you.

Thank you for reporting this, we will fix this soon.