VG construction and indexing needs decoy support
cmarkello opened this issue · 1 comments
@mlin
vg_construct_and_index.wdl
will need to be able to support incorporating decoy sequences into VG construction. Toil-vg's vg_construct.py
does this by parsing chromosome regions from the fasta file via a --fasta_regions
option https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_construct.py.
The issue there is that decoy regions won't have a respective VCF to construct VG graphs with, so we'll need to support constructing linear VG graphs of decoy regions.
Brainstorming some logic here:
- Workflow checks for some
include_decoys
Boolean input variable. - if
include_decoys
is true, parse fasta file for defined regions that aren't chromosomal. - Add functionality into
construct_graph
task to handlevg construct
calls without-v
and--region-is-chrom
flags ifcontig
input string is non chromosomal. - If
contig
is not chromosomal don't rungbwt_index
task within the scatter operation herehttps://github.com/vgteam/vg_wdl/blob/master/workflows/vg_construct_and_index.wdl#L55
.
This may require refactoring of GBWT and Snarl task functionality if they can't handle linear VG graphs.
Adapting the logic as defined for the 'is_chrom' flag in vg_construct
of toil-vg: https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_construct.py#L913