phasegenomics/FALCON-Phase

question for single copy regions in homologes chromosomes

BenjaminSchwessinger opened this issue · 1 comments

Hi there,
I now ran FALCON-Phase successfully. Worked pretty much out of the box. I had a couple of questions regarding the algorithm.

1 all primary contigs which did not have an associated haplotig appeared to be simply duplicated. Yet it might well be that these are pieces of the genome that are only present once based on read coverage analysis. I checked this previously. I guess there is some manual curation required for these. Thoughts?

2 Similarly, what about regions that were only present in primary contigs that lie between two haplotig alignments and are single copy regions. Based on the current description these regions are now simply duplicated to connect adjacent haplotigs. Again some read coverage analysis plus SV analysis with long reads may uncover those. Thoughts?

Curious to dig in a bit more.

Hi Ben,

You are correct that this is a limitation of the current implementation of FALCON-Phase and that manual curation would be required to properly handle hemizygous genomics regions. Without additional information (such as coverage, as you suggest) it is not possible to distinguish hemizygous regions of the genome from diploid but completely homozygous regions. We will be adding another option for the output format that will be similar to the FALCON-Unzip style of primaries and haplotigs, but with corrected phasing along the primaries.

Sarah