vgteam/vg_wdl

VG construction and indexing needs decoy support

cmarkello opened this issue · 1 comments

@mlin
vg_construct_and_index.wdl will need to be able to support incorporating decoy sequences into VG construction. Toil-vg's vg_construct.py does this by parsing chromosome regions from the fasta file via a --fasta_regions option https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_construct.py.

The issue there is that decoy regions won't have a respective VCF to construct VG graphs with, so we'll need to support constructing linear VG graphs of decoy regions.

Brainstorming some logic here:

  1. Workflow checks for some include_decoys Boolean input variable.
  2. if include_decoys is true, parse fasta file for defined regions that aren't chromosomal.
  3. Add functionality into construct_graph task to handle vg construct calls without -v and --region-is-chrom flags if contig input string is non chromosomal.
  4. If contig is not chromosomal don't run gbwt_index task within the scatter operation here https://github.com/vgteam/vg_wdl/blob/master/workflows/vg_construct_and_index.wdl#L55.

This may require refactoring of GBWT and Snarl task functionality if they can't handle linear VG graphs.

Adapting the logic as defined for the 'is_chrom' flag in vg_construct of toil-vg: https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_construct.py#L913

@mlin
If you're too busy, I could take this on.