Modules required for the nf-core compliant DSL2 implementation of the pipeline
subwaystation opened this issue ยท 12 comments
- multiqc - https://anaconda.org/bioconda/multiqc - https://github.com/nf-core/modules/tree/master/modules/multiqc
- wfmash - https://anaconda.org/bioconda/wfmash
- seqwish - https://anaconda.org/bioconda/seqwish - https://github.com/nf-core/modules/tree/master/modules/seqwish/induce - must be able to digest several flatten() paths as input
- smoothxg - https://anaconda.org/bioconda/smoothxg
- odgi - https://anaconda.org/bioconda/odgi
- build
- sort
- view
- stats
- viz
- layout
- draw
- unchop
- squeeze
- vg - https://anaconda.org/bioconda/vg
- deconstruct - Must be version 1.40.0 or there are bugs in it. See vgteam/vg#3807.
- gfaffix - https://anaconda.org/bioconda/gfaffix
- samtools - https://bioconda.github.io/recipes/samtools/README.html
- bcftools https://bioconda.github.io/recipes/bcftools/README.html
- vcfbub - https://anaconda.org/bioconda/vcfbub - @AndreaGuarracino: added in bioconda bioconda/bioconda-recipes#38565
- vcflib - https://anaconda.org/bioconda/vcflib - But it is outdated.
- vcfwave - Waiting for vcflib/vcflib#362.
- split_approx_mappings_in_chunks.py - https://github.com/waveygang/wfmash/blob/master/scripts/split_approx_mappings_in_chunks.py - @AndreaGuarracino: added in bioconda bioconda/bioconda-recipes#38824.
- paf2net.py - https://github.com/pangenome/pggb/blob/master/scripts/paf2net.py - @AndreaGuarracino: added in bioconda bioconda/bioconda-recipes#38825.
- net2communities.py - https://github.com/pangenome/pggb/blob/master/scripts/net2communities.py - @AndreaGuarracino: added in bioconda bioconda/bioconda-recipes#38825.
FINAL TASK:
- Check all versions of all tools again.
- https://github.com/pangenome/odgi/releases/tag/v0.8.2
- https://github.com/waveygang/wfmash/releases/tag/v0.10.2
- https://github.com/ekg/seqwish/releases/tag/v0.7.8
- https://github.com/pangenome/smoothxg/releases/tag/v0.7.0
- Bug @AndreaGuarracino to make a new smoothxg release so I can use its updated API regarding SPOA and abPOA.
OPTIONAL:
multiqc
is already present at version multiqc:1.11--pyhdfd78af_0
seqwish/induce
is already present at version seqwish:0.7.1--h2e03b76_0
samtools/faidx
is already present at version samtools:1.13--h8c37831_0
Cool!
I updated the list on the top to better reflect which subcommands we need from each tool. More than expected xD
seqwish
needs to be updated to v0.7.2. Also the folder structure can go from seqwish/induce to just seqwish. I see no reason to do the first one. And it needs appropriate test data. https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/genome is not sufficient. I will ask for an additional pangenome
folder, so we can put all test data relevant for this pipeline there.
Also the folder structure can go from seqwish/induce to just seqwish. I see no reason to do the first one.
That was the pattern to use at the time, create a tool directory (e.g. induce
) even if the software has only one function. It is possible this is no longer the recommendation.
https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/genome is not sufficient. I will ask for an additional pangenome folder, so we can put all test data relevant for this pipeline there
In nf-core/modules
the test data are only useful for smoke testing the modules (i.e. making sure they run with the correct inputs and outputs and don't explode). There are GFA files at https://github.com/nf-core/test-datasets/tree/modules/data/genomics/sarscov2/illumina/gfa to use. What else might we need?
Test data for the workflow itself are in this branch https://github.com/nf-core/test-datasets/tree/pangenome
bcftools/stats
is available https://github.com/nf-core/modules/tree/master/modules/bcftools/stats
There are still quite a few single tool directories present in https://github.com/nf-core/modules/tree/master/modules, e.g.
bamtools/split
bamutil/trimbam
bandage/image
checkm/lineagewf
cmseq/polymut
cnvkit/batch
I'll ask on slack what the current recommendation is
vg/deconstruct
is no longer a valid vg
command, at least as of the most recent Bioconda version
$ docker run -it quay.io/biocontainers/vg:1.36.0--h9ee0642_0 /bin/bash
root@3a42cf5ce1c3:/# vg help
vg: variation graph tool, version v1.36.0 "Cibottola"
usage: vg <command> [options]
main mapping and calling pipeline:
-- autoindex mapping tool-oriented index construction from interchange formats
-- construct graph construction
-- rna construct splicing graphs and pantranscriptomes
-- index index graphs or alignments for random access or mapping
-- map MEM-based read alignment
-- giraffe fast haplotype-aware short read alignment
-- mpmap splice-aware multipath alignment of short reads
-- augment augment a graph from an alignment
-- pack convert alignments to a compact coverage index
-- call call or genotype VCF variants
-- help show all subcommands
For more commands, type `vg help`.
For technical support, please visit: https://www.biostars.org/t/vg/
vg/deconstruct
is no longer a validvg
command, at least as of the most recent Bioconda version$ docker run -it quay.io/biocontainers/vg:1.36.0--h9ee0642_0 /bin/bash root@3a42cf5ce1c3:/# vg help vg: variation graph tool, version v1.36.0 "Cibottola" usage: vg <command> [options] main mapping and calling pipeline: -- autoindex mapping tool-oriented index construction from interchange formats -- construct graph construction -- rna construct splicing graphs and pantranscriptomes -- index index graphs or alignments for random access or mapping -- map MEM-based read alignment -- giraffe fast haplotype-aware short read alignment -- mpmap splice-aware multipath alignment of short reads -- augment augment a graph from an alignment -- pack convert alignments to a compact coverage index -- call call or genotype VCF variants -- help show all subcommands For more commands, type `vg help`. For technical support, please visit: https://www.biostars.org/t/vg/
It is! vg
is just hiding lots of commands. Just type vg help
.
vg: variation graph tool, version v1.36.0 "Cibottola"
usage: vg <command> [options]
main mapping and calling pipeline:
-- autoindex mapping tool-oriented index construction from interchange formats
-- construct graph construction
-- rna construct splicing graphs and pantranscriptomes
-- index index graphs or alignments for random access or mapping
-- map MEM-based read alignment
-- giraffe fast haplotype-aware short read alignment
-- mpmap splice-aware multipath alignment of short reads
-- augment augment a graph from an alignment
-- pack convert alignments to a compact coverage index
-- call call or genotype VCF variants
-- help show all subcommands
useful graph tools:
-- deconstruct create a VCF from variation in the graph
-- gbwt build and manipulate GBWTs
-- ids manipulate node ids
-- minimizer build a minimizer index or a syncmer index
-- mod filter, transform, and edit the graph
-- prune prune the graph for GCSA2 indexing
-- sim simulate reads from a graph
-- snarls compute snarls and their traversals
-- stats metrics describing graph and alignment properties
-- view format conversions for graphs and alignments
specialized graph tools:
-- align local alignment
-- annotate annotate alignments with graphs and graphs with alignments
-- chunk split graph or alignment into chunks
-- circularize circularize a path within a graph
-- clip remove BED regions (other other nodes from their snarls) from a graph
-- combine merge multiple graph files together
-- convert convert graphs between handle-graph compliant formats as well as GFA
-- depth estimate sequencing depth
-- dotplot generate the dotplot matrix from the embedded paths in an xg index
-- filter filter reads
-- gamcompare compare alignment positions
-- gampcompare compare multipath alignment positions
-- gamsort Sort a GAM file or index a sorted GAM file.
-- genotype Genotype (or type) graphs, GAMS, and VCFs.
-- inject lift over alignments for the graph
-- paths traverse paths in the graph
-- simplify graph simplification
-- surject map alignments onto specific paths
-- trace trace haplotypes
-- vectorize transform alignments to simple ML-compatible vectors
-- viz render visualizations of indexed graphs and read sets
developer commands:
-- benchmark run and report on performance benchmarks
-- cluster find and cluster mapping seeds
-- find use an index to find nodes, edges, kmers, paths, or positions
-- mcmc Finds haplotypes based on reads using MCMC methods
-- test run unit tests
-- validate validate the semantics of a graph or gam
-- version version information
For technical support, please visit: https://www.biostars.org/t/vg/
Also the folder structure can go from seqwish/induce to just seqwish. I see no reason to do the first one.
That was the pattern to use at the time, create a tool directory (e.g.
induce
) even if the software has only one function. It is possible this is no longer the recommendation.https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/genome is not sufficient. I will ask for an additional pangenome folder, so we can put all test data relevant for this pipeline there
In
nf-core/modules
the test data are only useful for smoke testing the modules (i.e. making sure they run with the correct inputs and outputs and don't explode). There are GFA files at https://github.com/nf-core/test-datasets/tree/modules/data/genomics/sarscov2/illumina/gfa to use. What else might we need?Test data for the workflow itself are in this branch https://github.com/nf-core/test-datasets/tree/pangenome
I think the discussion about test data deserves its own issue. Let's continue at #74.
vg is just hiding lots of commands.
Ah got it, thanks!
For now, we don't need the optional module. Happy easter!